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Description 



BIOINFORMATICALLY DETECTABLE 
GROUP OF NOVEL REGULATORY 
BACTERIAL AND BACTERIAL 
ASSOCIATED OLIGONUCLEOTIDES AND 
USES THEREOF 

CROSS REFERENCE TO RELATED APPLICATIONS 

[0001] This application is a continuation in part of U.S Patent Ap- 
plication Serial No.10/708,951, filed 2-Apr-04, entitled 
"Bioinformatically Detectable Croup of Novel Regulatory 
Bacterial and Bacterial Associated Oligonucleotides and 
Uses Thereof", the disclosure of which is hereby incorpo- 
rated by reference and claims priority therefrom; This ap- 
plication also is a continuation in part of U.S. Provisional 
Patent Application Serial No. 60/521,433 filed 26-Apr-04, 
entitled "A Microarray for the Detection of MicroRNA 
Oligonucleotides", the disclosure of which is hereby incor- 
porated by reference and claims priority therefrom. 
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Background of the invention 

FIELD OF THE INVENTION 

[0018] The present invention relates to a group of bioinformati- 
cally detectable novel bacterial oligonucleotides and to a 
group of bioinformatically detectable novel human 
oligonucleotides associated with bacterial infections, both 
are identified here as "Genomic Address Messenger" 
(GAM) oligonucleotides. 

[0019] All of abovementioned oligonucleotides are believed to be 
related to the microRNA (miRNA) group of oligonu- 
cleotides. 
DESCRIPTION OF PRIOR ART 

[0020] miRNA oligonucleotides are short -22 nucleotide 

(nt)-long, non-coding, regulatory RNA oligonucleotides 
that are found in a wide range of species. miRNA oligonu- 



cleotides are believed to function as specific gene transla- 
tion repressors and are sometimes involved in cell differ- 
entiation. 

[0021] The ability to detect novel miRNA oligonucleotides is lim- 
ited by the methodologies used to detect such oligonu- 
cleotides. All miRNA oligonucleotides identified so far ei- 
ther present a visibly discernable whole body phenotype, 
as do Lin-4 and Let-7 (Wightman,B., Ha,l., and Ruvkun,G., 
Cell 75: 855-862 (1993); Reinhart et al. Nature 403: 
901-906 (2000)), or produce sufficient quantities of RNA 
so as to be detected by standard molecular biological 
techniques. 

[0022] Ninety-three miRNA oligonucleotides have been discov- 
ered in several species (Lau et al., Science 294: 858-862 
(2001), Lagos-Quintana et al., Science 294: 853-858 
(2001)) by sequencing a limited number of clones (300 by 
Lau and 100 by Lagos-Quintana) of size-fractionated 
small segments of RNA. miRNAs that were detected in 
these studies therefore represent the more prevalent 
among the miRNA oligonucleotide family and cannot be 
much rarer than 1% of all small ~20 nt-long RNA oligonu- 
cleotides. 

[0023] The aforementioned studies provide no basis for the de- 



tection of miRNA oligonucleotides which either do not 
present a visually discernable whole body phenotype, or 
are rare (e.g. rarer than 0.1% of all of the size- 
fractionated, -20 nt-long RNA segments that were ex- 
pressed in the tissues examined), and therefore do not 
produce large enough quantities of RNA to be detected by 
standard biological techniques. 
[0024] jo d atej miRNA oligonucleotides have not been detected 
in bacteria. 

[0025] The following U.S. Patents relate to bioinformatic detec- 
tion of genes: U.S Patent No. 348935, entitled "Statistical 
algorithms for folding and target accessibility prediction 
and design of nucleic acids", U.S Patent No. 6,369,195, 
entitled "Prostate-specific gene for diagnosis, prognosis 
and management of prostate cancer", and U.S Patent 
No. 6, 291, 666 entitled "Spike tissue-specific promoter", 
each of which is hereby incorporated by reference herein. 
BRIEF DESCRIPTION OF SEQUENCE LISTING, TABLES AND 

COMPUTER PROGRAM LISTING 

[0026] a sequence listing is attached to the present invention, 

comprising 4,254,670 genomic sequences, is contained in 
a file named SEQ_LIST.txt (720288KB, 18-May-04), and is 
hereby incorporated by reference herein. 



[0027] Tables relating to genomic sequences are attached to the 
present application, appear in the following files (size, 
creation date) included on CD, incorporated herein: TA- 
BLE_l.txt (28.3 MB, 18-May-04), TABLE_2.txt (350 MB, 
18-May-04), TABLE_3.txt (5.64 MB, 18-May-04), TA- 
BLE_4.txt (17.1 MB, 18-May-04), TABLE_5.txt (5.04 MB, 
18-May-04), TABLE_6.txt (536 MB, 18-May-04), TA- 
BLE_7_A.txt (619 MB, 18-May-04), TABLE_7_B.txt (340 
MB, 18-May-04), TABLE_8_A.txt (619 MB, 18-May-04), 
TABLE_8_B.txt (619 MB, 18-May-04), TABLE_8_C.txt (619 
MB, 18-May-04), TABLE_8_D.txt (457 MB, 18-May-04), 
TABLE_9.txt (654 MB, 18-May-04), TABLE_10.txt (49.1 
MB, 18-May-04), and TABLE_ll.txt (79.8 MB, 
18-May-04), all of which are incorporated by reference 
herein. Further, additional tables relating to genomic se- 
quences are attached to the present application, appear in 
the following files (size, creation date) attached to the ap- 
plication, incorporated herein: TABLE_12.txt (41.1 KB, 
18-May-04) and TABLE_13.txt (46.9 KB,18-May-04), are 
incorporated by reference herein. 

[0028] a computer program listing constructed and operative in 
accordance with a preferred embodiment of the present 
invention is enclosed on an electronic medium in com- 



puter readable form, and is hereby incorporated by refer- 
ence herein. The computer program listing is contained in 
7 files, the name, sizes and creation date of which are as 
follows: AUXILARY_FILES.txt (117K, 14-Nov-03); 
EDIT_DISTANCE.txt (144K, 24-Nov-03); FIRST- K.txt (96K, 
24-Nov-03); HAIRPIN_PREDICTION.txt (19K, 25-Mar-04); 
TWO_PHASED_SIDE_SELECTOR.txt (4K, 14-Nov-03); 
TWO_PHASED_PREDICTOR.txt (74K, 14-Nov-03), and 
BS_CODE.txt (118K,ll-May-04). 
Summary of the invention 

[0029] The present invention relates to a novel group of 3,873 
bioinformatically detectable bacterial regulatory RNA 
oligonucleotides, which repress expression of human tar- 
get genes, by means of complementary hybridization to 
binding sites in untranslated regions of these target 
genes. It is believed that this novel group of bacterial 
oligonucleotides represents a pervasive bacterial mecha- 
nism of attacking a host, and therefore knowledge of this 
novel group of bacterial oligonucleotides may be useful in 
preventing and treating bacterial diseases. 

[0030] Additionally, the present invention relates to a novel 

group of 4,363 bioinformatically detectable human regu- 
latory RNA oligonucleotides, which repress expression of 



human target genes associated with the bacterial infec- 
tion, by means of complementary hybridization to binding 
sites in untranslated regions of these target genes. It is 
believed that this novel group of human oligonucleotides 
represents a pervasive novel host response mechanism, 
and therefore knowledge of this novel group of human 
oligonucleotides may be useful in preventing and treating 
bacterial diseases. 

[0031] Furthermore, the present invention relates to a novel 
group of 24,160 bioinformatically detectable bacterial 
regulatory RNA oligonucleotides, which repress expres- 
sion of bacterial target genes, by means of complemen- 
tary hybridization to binding sites in untranslated regions 
of these bacterial target genes. It is believed that this 
novel group of bacterial oligonucleotides represents a 
pervasive novel internal bacterial regulation mechanism, 
and therefore knowledge of this novel group of bacterial 
oligonucleotides may be useful in preventing and treating 
bacterial diseases. 

[0032] | n addition, the present invention relates to a novel group 
of 6,100 bioinformatically detectable human regulatory 
RNA oligonucleotides, which repress expression of bacte- 
rial target genes, by means of complementary hybridiza- 



tion to binding sites in untranslated regions of these bac- 
terial target genes. It is believed that this novel group of 
human oligonucleotides represents a pervasive novel anti- 
bacterial host defense mechanism, and therefore knowl- 
edge of this novel group of human oligonucleotides may 
be useful in preventing and treating bacterial diseases. 

[0033] Also disclosed are 6,056 novel microRNA-cluster like bac- 
terial polynucleotides and 430 novel microRNA-cluster 
like human polynucleotides, both referred to here as Ge- 
nomic Record (GR) polynucleotides. 

[0034] | n various preferred embodiments, the present invention 
seeks to provide improved method and system for detec- 
tion and prevention of bacterial diseases, which are medi- 
ated by this group of novel oligonucleotides. 

[0035] Accordingly, the invention provides several substantially 
pure nucleic acids (e.g., genomic DNA, cDNA or synthetic 
DNA) each comprising a novel GAM oligonucleotide, vec- 
tors comprising the DNAs, probes comprising the DNAs, a 
method and system for selectively modulating translation 
of known target genes utilizing the vectors, and a method 
and system utilizing the GAM probes to modulate expres- 
sion of GAM target genes. 

[0036] The present invention represents a scientific break- 



through, disclosing novel miRNA-like oligonucleotides the 
number of which is dramatically larger than previously be- 
lieved existed. Prior-art studies reporting miRNA oligonu- 
cleotides ((Lau et al., Science 294:858-862 (2001), Lagos- 
Quintana et al., Science 294: 853-858 (2001)) discovered 
93 miRNA oligonucleotides in several species, including 
21 in human, using conventional molecular biology meth- 
ods, such as cloning and sequencing. 
[0037] Molecular biology methodologies employed by these 
studies are limited in their ability to detect rare miRNA 
oligonucleotides, since these studies relied on sequencing 
of a limited number of clones (300 clones by Lau and 100 
clones by Lagos-Quintana) of small segments (i.e. size- 
fractionated) of RNA. miRNA oligonucleotides detected in 
these studies therefore, represent the more prevalent 
among the miRNA oligonucleotide family, and are typically 
not be much rarer than 1% of all small ~20 nt-long RNA 
oligonucleotides present in the tissue from the RNA was 
extracted. 

[0038] Recent studies state the number of miRNA oligonu- 
cleotides to be limited, and describe the limited sensitivity 
of available methods for detection of miRNA oligonu- 
cleotides: "The estimate of 255 human miRNA oligonu- 



cleotides is an upper bound implying that no more than 
40 miRNA oligonucleotides remain to be identified in 
mammals" (Lim et al., Science, 299:1540 (2003)); "Esti- 
mates place the total number of vertebrate miRNA genes 
at about 200-250" (Ambros et al. Curr. Biol. 13:807-818 
(2003)); and "Confirmation of very low abundance miRNAs 
awaits the application of detection methods more sensi- 
tive than Northern blots" (Ambros et al. Curr. Biol. 
13:807-818 (2003)). 

[0039] The oligonucleotides of the present invention represent a 
revolutionary new dimension of genomics and of biology: 
a dimension comprising a huge number of non-pro- 
tein-coding oligonucleotides which modulate expression 
of thousands of proteins and are associated with numer- 
ous major diseases. This new dimension disclosed by the 
present invention dismantles a central dogma that has 
dominated life-sciences during the past 50 years, a 
dogma which has emphasized the importance of protein- 
coding regions of the genome, holding non-pro- 
tein-coding regions to be of little consequence, often 
dubbing them "junk DNA". 

[0040] indeed, only in November, 2003 has this long held belief 
as to the low importance of non-protein-coding regions 



been vocally challenged. As an example, an article titled 
"The Unseen Genome - Gems in the Junk" (Gibbs, W.W. 
Sci. Am. 289:46-53 (2003)) asserts that the failure to rec- 
ognize the importance of non-protein- coding regions 
"may well go down as one of the biggest mistakes in the 
history of molecular biology." Gibbs further asserts that 
"what was damned as junk because it was not understood, 
may in fact turn out to be the very basis of human com- 
plexity." The present invention provides a dramatic leap in 
understanding specific important roles of non-pro- 
tein-coding regions. 

[0041] An additional scientific breakthrough of the present in- 
vention is a novel conceptual model disclosed by the 
present invention, which conceptual model is preferably 
used to encode in a genome the determination of cell dif- 
ferentiation, utilizing oligonucleotides and polynu- 
cleotides of the present invention. 

[0042] using the bioinformatic engine of the present invention, 
21,916 bacterial GAM oligonucleotides and their respec- 
tive precursors and targets have been detected and 6,100 
human GAM oligonucleotides and their respective precur- 
sors and targets have been detected. These bioinformatic 
predictions are supported by robust biological studies. 



Microarray experiments validated expression of 346 of the 
human GAM oligonucleotides of the present invention. Of 
these, 311 received an extremely high score: over six 
standard deviations higher than the background "noise" of 
the microarray, and over two standard deviations above 
their individual "mismatch" control probes and 33 re- 
ceived a high score: over four standard deviations higher 
than the background "noise" of the microarray. Further, 
38 GAM oligonucleotides were sequenced. 
[0043] | n various preferred embodiments, the present invention 
seeks to provide an improved method and system for 
specific modulation of the expression of specific target 
genes involved in significant human diseases. It also pro- 
vides an improved method and system for detection of the 
expression of novel oligonucleotides of the present inven- 
tion, which modulate these target genes. In many cases, 
the target genes may be known and fully characterized, 
however in alternative embodiments of the present inven- 
tion, unknown or less well characterized genes may be 
targeted. 

[0044] A "Nucleic acid" is defined as a ribonucleic acid (RNA) 

molecule, or a deoxyribonucleic acid (DNA) molecule, or 
complementary deoxyribonucleic acid (cDNA), comprising 



either naturally occurring nucleotides or non-naturally oc- 
curring nucleotides. 
[0045] "Substantially pure nucleic acid", "Isolated Nucleic Acid", 
"Isolated Oligoucleotide" and "Isolated Polynucleotide" are 
defined as a nucleic acid that is free of the genome of the 
organism from which the nucleic acid is derived, and in- 
clude, for example, a recombinant nucleic acid which is 
incorporated into a vector, into an autonomously replicat- 
ing plasmid or virus, or into the genomic nucleic acid of a 
prokaryote or eukaryote at a site other than its natural 
site; or which exists as a separate molecule (e.g., a cDNA 
or a genomic or cDNA fragment produced by PCR or re- 
striction endonuclease digestion) independent of other 
nucleic acids. 

[0046] An "Oligonucleotide" is defined as a nucleic acid compris- 
ing 2-139 nts, or preferably 16-120 nts. A "Polynu- 
cleotide" is defined as a nucleic acid comprising 
140-5000 nts, or preferably 140-1000 nts. 

[0047] a "Complementary" sequence is defined as a first nu- 
cleotide sequence which reverses complementary of a 
second nucleotide sequence: the first nucleotide sequence 
is reversed relative to a second nucleotide sequence, and 
wherein each nucleotide in the first nucleotide sequence is 



complementary to a corresponding nucleotide in the sec- 
ond nucleotide sequence (e.g. ATGGC is the complemen- 
tary sequence of GCCAT). 
[0048] "Hybridization", "Binding" and "Annealing" are defined as 
hybridization, under in vivo physiological conditions, of a 
first nucleic acid to a second nucleic acid, which second 
nucleic acid is at least partially complementary to the first 
nucleic acid. 

[0049] a "Hairpin Structure" is defined as an oligonucleotide hav- 
ing a nucleotide sequence that is 50-140 nts in length, 
the first half of which nucleotide sequence is at least par- 
tially complementary to the second part thereof, thereby 
causing the nucleic acid to fold onto itself, forming a sec- 
ondary hairpin structure. 

[0050] a "Hairpin-Shaped Precursor" is defined as a Hairpin 

Structure which is processed by a Dicer enzyme complex, 
yielding an oligonucleotide which is about 19 to about 24 
nts in length. 

[0051] "inhibiting translation" is defined as the ability to prevent 
synthesis of a specific protein encoded by a respective 
gene by means of inhibiting the translation of the mRNA 
of this gene. For example, inhibiting translation may in- 
clude the following steps: (1) a DNA segment encodes an 



RNA, the first half of whose sequence is partially comple- 
mentary to the second half thereof; (2) the precursor folds 
onto itself forming a hairpin-shaped precursor; (3) a Dicer 
enzyme complex cuts the hairpin-shaped precursor yield- 
ing an oligonucleotide that is approximately 22 nt in 
length; (4) the oligonucleotide binds complementarily to 
at least one binding site, having a nucleotide sequence 
that is at least partially complementary to the oligonu- 
cleotide, which binding site is located in the mRNA of a 
target gene, preferably in the untranslated region (UTR) of 
a target gene, such that the binding inhibits translation of 
the target protein. 

[0052] a "Translation inhibitor site" is defined as the minimal nu- 
cleotide sequence sufficient to inhibit translation. 

[0053] The present invention describes novel CAM oligonu- 
cleotides, detected using a bioinformatic engine described 
hereinabove. The ability of this detection engine has been 
demonstrated using stringent algorithmic criteria, show- 
ing that the engine has both high sensitivity, indicated by 
the high detection rate of published miRNA oligonu- 
cleotides and their targets, as well as high specificity, in- 
dicated by the low amount of "background" hairpin candi- 
dates passing its filters. Laboratory tests, based both on 



sequencing of predicted GAM oligonucleotides and on mi- 
croarray experiments, validated 381 of the GAM oligonu- 
cleotides in the present invention. Further, almost all of 
the bacterial target genes (6,141 of the 7,351) and almost 
all of the human target genes (64 out of 76) described in 
the present invention are bound by one or more of the 
381 human GAM oligonucleotides validated by the mi- 
croarray experiments. 

[0054] There is thus provided in accordance with a preferred em- 
bodiment of the present invention a bioinformatically de- 
tectable isolated oligonucleotide which is endogenously 
processed from a hairpin-shaped precursor, and anneals 
to a portion of a mRNA transcript of a target gene, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs: 1-385 and 386-49787. 

[0055] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide having a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs: 1-385 and 386-49787. 



[0056] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable first oligonucleotide which is a por- 
tion of a mRNA transcript of a target gene, and anneals to 
a second oligonucleotide that is endogenously processed 
from a hairpin precursor, wherein binding of the first 
oligonucleotide to the second oligonucleotide represses 
expression of the target gene, and wherein nucleotide se- 
quence of the second nucleotide is selected from the 
group consisting of SEQ ID NOs: 1-385 and 386-49787. 

[0057] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable oligonucleotide having a nucleotide 
sequence selected from the group consisting of SEQ ID 
NOs: 2337129-4223628. 

[0058] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Bordetella pertussis infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 



nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 2. 

[0059] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Brucella suis 1330 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 3. 

[0060] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Chlamydia trachomatis infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 4. 

[0061] There is additionally provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Chlamydophila pneumoniae AR39 infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 5. 
[0062] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Chlamydophila pneumoniae CWL029 in- 
fection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 6. 

[0063] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 



neals to a portion of a mRNA transcript of a target gene 
associated with Chlamydophila pneumoniae J138 infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 7. 
[0064] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Chlamydophila pneumoniae TW-183 in- 
fection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 8. 

[0065] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Coxiella burnetii RSA 493 infection, 



wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 9. 

[0066] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Escherichia coli CFT073 infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 10. 

[0067] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Haemophilus influenzae Rd infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 



identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
11. 

[0068] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Leptospira interrogans serovar lai str. 
56601 infection, wherein binding of the oligonucleotide to 
the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 12. 

[0069] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Listeria monocytogenes EGD-e infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 



group consisting of SEQ ID NOs shown in Table 13 row 
13. 

[0070] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Mycobacterium avium subsp. paratuber- 
culosis infection, wherein binding of the oligonucleotide 
to the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 14. 

[0071] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Mycobacterium bovis subsp bovis 
AF2122/97 infection, wherein binding of the oligonu- 
cleotide to the mRNA transcript represses expression of 
the target gene, and wherein the oligonucleotide has at 
least 80% sequence identity with a nucleotide sequence 
selected from the group consisting of SEQ ID NOs shown 



in Table 13 row 15. 

[0072] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Mycobacterium leprae infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 16. 

[0073] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Mycobacterium tuberculosis CDC1551 in- 
fection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 17. 

[0074] There is moreover provided in accordance with another 



preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Mycobacterium tuberculosis H37Rv infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
18. 

[0075] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Neisseria meningitidis MC58 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
19. 

[0076] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Neisseria meningitidis Z2491 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
20. 

[0077] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Pseudomonas aeruginosa PA01 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
21. 

[0078] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 



neals to a portion of a mRNA transcript of a target gene 
associated with Pseudomonas putida KT2440 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
22. 

[0079] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Rickettsia prowazekii infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 23. 

[0080] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Salmonella enterica enterica serovar Typhi 



infection, wherein binding of the oligonucleotide to the 
mRNA transcript represses expression of the target gene, 
and wherein the oligonucleotide has at least 80% se- 
quence identity with a nucleotide sequence selected from 
the group consisting of SEQ ID NOs shown in Table 13 
row 24. 

[0081] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Salmonella enterica enterica serovar Typhi 
Ty2 infection, wherein binding of the oligonucleotide to 
the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 25. 

[0082] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Salmonella typhimurium LT2 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 



script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
26. 

[0083] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Shigella flexneri 2a str. 2457T infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
27. 

[0084] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Shigella flexneri 2a str. 301 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 



wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
28. 

[0085] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Staphylococcus aureus subsp. aureus 
Mu50 infection, wherein binding of the oligonucleotide to 
the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 29. 

[0086] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Staphylococcus aureus subsp. aureus 
MW2 infection, wherein binding of the oligonucleotide to 
the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 



sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 30. 

[0087] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Staphylococcus aureus subsp. aureus 
N315 infection, wherein binding of the oligonucleotide to 
the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 31. 

[0088] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pneumoniae R6 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 



group consisting of SEQ ID NOs shown in Table 13 row 
32. 

[0089] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pneumoniae TIGR4 infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
33. 

[0090] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pyogenes Ml GAS infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 



34. 

[0091] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pyogenes MGAS315 infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
35. 

[0092] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pyogenes MCAS8232 infec- 
tion, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
36. 



[0093] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Streptococcus pyogenes SSI- 1 infection, 
wherein binding of the oligonucleotide to the mRNA tran- 
script represses expression of the target gene, and 
wherein the oligonucleotide has at least 80% sequence 
identity with a nucleotide sequence selected from the 
group consisting of SEQ ID NOs shown in Table 13 row 
37. 

[0094] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Treponema pallidum subsp. pallidum str. 
Nichols infection, wherein binding of the oligonucleotide 
to the mRNA transcript represses expression of the target 
gene, and wherein the oligonucleotide has at least 80% 
sequence identity with a nucleotide sequence selected 
from the group consisting of SEQ ID NOs shown in Table 
13 row 38. 

[0095] There is further provided in accordance with another pre- 



ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Yersinia pestis infection, wherein binding 
of the oligonucleotide to the mRNA transcript represses 
expression of the target gene, and wherein the oligonu- 
cleotide has at least 80% sequence identity with a nu- 
cleotide sequence selected from the group consisting of 
SEQ ID NOs shown in Table 13 row 39. 

[0096] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which an- 
neals to a portion of a mRNA transcript of a target gene 
associated with Yersinia pestis KIM infection, wherein 
binding of the oligonucleotide to the mRNA transcript re- 
presses expression of the target gene, and wherein the 
oligonucleotide has at least 80% sequence identity with a 
nucleotide sequence selected from the group consisting 
of SEQ ID NOs shown in Table 13 row 40. 

[0097] There is additionally provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically expressed to an undesirable ex- 



tent, the protein having a messenger RNA, the method in- 
cluding: providing a material which modulates activity of a 
microRNA oligonucleotide which binds complementarily to 
a segment of the messenger RNA, and introducing the 
material into the tissue, causing modulation of the activity 
of the microRNA oligonucleotide and thereby modulating 
expression of the protein in a desired manner. 
[0098] There is moreover provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving tissue in which a pro- 
tein is pathologically expressed to an undesirable extent, 
the protein having a messenger RNA, the method includ- 
ing: providing a material which at least partially binds a 
segment of the messenger RNA that is bound comple- 
mentarily by a microRNA oligonucleotide, thereby modu- 
lating expression of the protein, and introducing the ma- 
terial into the tissue, thereby modulating expression of 
the protein. 

[0099] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a method for 
treatment of a disease involving a tissue in which a pro- 
tein is pathologically over-expressed, the protein having a 
messenger RNA, the method including: providing a mi- 



croRNA oligonucleotide which binds complementarily to a 
segment of the messenger RNA, and introducing the mi- 
croRNA oligonucleotide into the tissue, causing the mi- 
croRNA oligonucleotide to bind complementarily to a seg- 
ment of the messenger RNA and thereby inhibit expres- 
sion of the protein. 
[0100] There is still further provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically over-expressed, the protein hav- 
ing a messenger RNA, the method including: providing a 
chemically-modified microRNA oligonucleotide which 
binds complementarily to a segment of the messenger 
RNA, and introducing the chemically-modified microRNA 
oligonucleotide into the tissue, causing the microRNA 
oligonucleotide to bind complementarily to a segment of 
the messenger RNA and thereby inhibit expression of the 
protein. 

[0101] There is additionally provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically under-expressed, the protein 
having a messenger RNA, the method including: providing 



an oligonucleotide that inhibits activity of a microRNA 
oligonucleotide which binds complementarily to a seg- 
ment of the messenger RNA, and introducing the oligonu- 
cleotide into the tissue, causing inhibition of the activity 
of the microRNA oligonucleotide and thereby promotion 
of translation of the protein. 
[0102] There is moreover provided in accordance with another 
preferred embodiment of the present invention a method 
for treatment of a disease involving a tissue in which a 
protein is pathologically under-expressed, the protein 
having a messenger RNA, the method including: providing 
a chemically-modified oligonucleotide that inhibits activ- 
ity of a microRNA oligonucleotide which binds comple- 
mentarily to a segment of the messenger RNA, and intro- 
ducing the chemically-modified oligonucleotide into the 
tissue, causing inhibition of the activity of the microRNA 
oligonucleotide and thereby promotion of translation of 
the protein. 

[0103] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a method for 
diagnosis of a disease involving a tissue in which a protein 
is expressed to abnormal extent, the protein having a 
messenger RNA, the method including: assaying a mi- 



croRNA oligonucleotide which at least partially binds a 
segment of the messenger RNA and modulates expression 
of the protein, thereby providing an indication of at least 
one parameter of the disease. 
[0104] There is still further provided in accordance with another 
preferred embodiment of the present invention a method 
for detection of expression of an oligonucleotide, the 
method including: determining a first nucleotide sequence 
of a first oligonucleotide, which first nucleotide sequence 
is not complementary to a genome of an organism, re- 
ceiving a second nucleotide sequence of a second 
oligonucleotide whose expression is sought to be de- 
tected, designing a third nucleotide sequence that is com- 
plementary to the second nucleotide sequence of the sec- 
ond oligonucleotide, and a fourth nucleotide sequence 
that is complementary to a fifth nucleotide sequence 
which is different from the second nucleotide sequence of 
the second oligonucleotide by at least one nucleotide, 
synthesizing a first oligonucleotide probe having a sixth 
nucleotide sequence including the third nucleotide se- 
quence followed by the first nucleotide sequence of the 
first oligonucleotide, and a second oligonucleotide probe 
having a seventh nucleotide sequence including the fourth 



nucleotide sequence followed by the first nucleotide se- 
quence of the first oligonucleotide, locating the first 
oligonucleotide probe and the second oligonucleotide 
probe on a microarray platform, receiving an RNA test 
sample from at least one tissue of the organism, obtaining 
size-fractionated RNA from the RNA test sample, amplify- 
ing the size-fractionated RNA, hybridizing the adaptor- 
linked RNA with the first and second oligonucleotide 
probes on the microarray platform, and determining ex- 
pression of the first oligonucleotide in the at least one tis- 
sue of the organism, based at least in part on the hy- 
bridizing. 

[0105] There is additionally provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated polynucleotide which is en- 
dogenously processed into a plurality of hairpin-shaped 
precursor oligonucleotides, each of which is endogenously 
processed into a respective oligonucleotide, which in turn 
anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene. 

[0106] There is moreover provided in accordance with another 

preferred embodiment of the present invention a bioinfor- 



matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the target gene does not encode a protein. 

[0107] There is further provided in accordance with another pre- 
ferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein a function of the oligonucleotide includes modu- 
lation of cell type. 

[0108] There is still further provided in accordance with another 
preferred embodiment of the present invention a bioinfor- 
matically detectable isolated oligonucleotide which is en- 
dogenously processed from a hairpin-shaped precursor, 
and anneals to a portion of a mRNA transcript of a target 
gene, wherein binding of the oligonucleotide to the mRNA 
transcript represses expression of the target gene, and 
wherein the oligonucleotide is maternally transferred by a 



cell to at least one daughter cell of the cell, and a function 
of the oligonucleotide includes modulation of cell type of 
the daughter cell. 
[0109] There is additionally provided in accordance with another 
preferred embodiment of the present invention a method 
for bioinformatic detection of microRNA oligonucleotides, 
the method including: bioinformatically detecting a hair- 
pin-shaped precursor oligonucleotide, bioinformatically 
detecting an oligonucleotide which is endogenously pro- 
cessed from the hairpin-shaped precursor oligonu- 
cleotide, and bioinformatically detecting a target gene of 
the oligonucleotide wherein the oligonucleotide anneals to 
at least one portion of a mRNA transcript of the target 
gene, and wherein the binding represses expression of 
the target gene, and the target gene is associated with a 
disease. 
Brief Description of Drawings 

[0110] pig. 1 is a simplified diagram illustrating a mode by which 
an oligonucleotide of a novel group of oligonucleotides of 
the present invention modulates expression of known tar- 
get genes; 

[° 111 ] Fig. 2 is a simplified block diagram illustrating a bioinfor- 
matic oligonucleotide detection system capable of detect- 



ing oligonucleotides of the novel group of oligonu- 
cleotides of the present invention, which system is con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention; 
[0112] pig. 3 is a simplified flowchart illustrating operation of a 
mechanism for training of a computer system to recog- 
nize the novel oligonucleotides of the present invention, 
which mechanism is constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; 

[° 113 ] Fig. 4A is a simplified block diagram of a non-coding ge- 
nomic sequence detector constructed and operative in ac- 
cordance with a preferred embodiment of the present in- 
vention; 

[0114] pig. 4B is a simplified flowchart illustrating operation of a 
non-coding genomic sequence detector constructed and 
operative in accordance with a preferred embodiment of 
the present invention; 

[0115] Fig. 5A is a simplified block diagram of a hairpin detector 
constructed and operative in accordance with a preferred 
embodiment of the present invention; 

[0116] Fig. 5B is a simplified flowchart illustrating operation of a 
hairpin detector constructed and operative in accordance 



with a preferred embodiment of the present invention; 

[01 17] Fig. 6A is a simplified block diagram of a Dicer-cut loca- 
tion detector constructed and operative in accordance 
with a preferred embodiment of the present invention; 

[0118] pig. 6B is a simplified flowchart illustrating training of a 
Dicer-cut location detector constructed and operative in 
accordance with a preferred embodiment of the present 
invention; 

[0119] Fig. 6C is a simplified flowchart illustrating operation of a 
Dicer-cut location detector constructed and operative in 
accordance with a preferred embodiment of the present 
invention; 

[0120] Fig. 7 A is a simplified block diagram of a target gene 

binding site detector constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; 

[0121] Fig. 7B is a simplified flowchart illustrating operation of a 
target gene binding site detector constructed and opera- 
tive in accordance with a preferred embodiment of the 
present invention; 

[0122] Fig. 8 is a simplified flowchart illustrating operation of a 
function and utility analyzer constructed and operative in 
accordance with a preferred embodiment of the present 



invention; 

[0123] pig. 9 is a simplified diagram describing a novel bioinfor- 
matically-detected group of regulatory polynucleotides, 
referred to here as Genomic Record (GR) polynucleotides, 
each of which encodes an "operon-like" cluster of novel 
microRNA-like oligonucleotides, which in turn modulate 
expression of one or more target genes; 

[0124] pig. 10 is a block diagram illustrating different utilities of 
novel oligonucleotides and novel operon-like polynu- 
cleotides, both of the present invention; 

[0125] Figs. 11A and 11B are simplified diagrams which, when 
taken together, illustrate a mode of oligonucleotide ther- 
apy applicable to novel oligonucleotides of the present in- 
vention; 

[0126] Fig. 12A is a bar graph illustrating performance results of 
a hairpin detector constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; 

[0127] Fig. 12B is a line graph illustrating accuracy of a Dicer-cut 
location detector constructed and operative in accordance 
with a preferred embodiment of the present invention; 

[0128] Fig. 12C is a bar graph illustrating performance results of 
the target gene binding site detector 118, constructed and 



operative in accordance with a preferred embodiment of 
the present invention. 
[0129] pig. 13 is a summary table of laboratory results validating 
expression of novel human oligonucleotides detected by a 
bioinformatic oligonucleotide detection engine con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention, thereby validating its 
efficacy; 

[0130] pig. 14A is a schematic representation of an "operon-like" 
cluster of novel human hairpin sequences detected by a 
bioinformatic oligonucleotide detection engine con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention, and non-CAM hairpin 
sequences used as negative controls thereto; 

[° 131 ] Fig. 14B is a schematic representation of secondary fold- 
ing of hairpins of the operon-like cluster of Fig. 14A; 

[0132] Fig. 14C is a picture of laboratory results demonstrating 
expression of novel oligonucleotides of Figs. 14A and 14B 
and lack of expression of the negative controls, thereby 
validating efficacy of bioinformatic detection of GAM 
oligonucleotides and GR polynucleotides detected by a 
bioinformatic oligonucleotide detection engine, con- 
structed and operative in accordance with a preferred em- 



bodiment of the present invention; 
[0133] pig. 15A is an annotated sequence of EST72223 compris- 
ing known human microRNA oligonucleotide MIR98 and 
novel human oligonucleotide CAM25 PRECURSOR detected 
by the oligonucleotide detection system of the present in- 
vention; and 

[0134] pigs. 15B, 15C and 15D are pictures of laboratory results 
demonstrating laboratory confirmation of expression of 
known human oligonucleotide MIR98 and of novel bioin- 
formatically-detected human GAM25 RNA respectively, 
both of Fig. 15A, thus validating the bioinformatic 
oligonucleotide detection system of the present invention; 

[0135] Fig. 16A, 16B and 16C are schematic diagrams which, 
when taken together, represent methods of designing 
primers to identify specific hairpin oligonucleotides in ac- 
cordance with a preferred embodiment of the present in- 
vention. 

[0136] Fig. 17A is a simplified flowchart illustrating construction 
of a microarray constructed and operative to identify novel 
oligonucleotides of the present invention, in accordance 
with a preferred embodiment of the present invention; 

[0137] Fig. 17B is a simplified block diagram illustrating design 
of a microarray constructed and operative to identify novel 



oligonucleotides of the present invention, in accordance 
with a preferred embodiment of the present invention; 

[0138] pig. 17C is a flowchart illustrating a mode of preparation 
and amplification of a cDNA library in accordance with a 
preferred embodiment of the present invention; 

[0139] pig. 18A is a line graph showing results of detection of 
known microRNA oligonucleotides and of novel GAM 
oligonucleotides, using a microarray constructed and op- 
erative in accordance with a preferred embodiment of the 
present invention; 

[0140] Fig. 18B is a line graph showing specificity of hybridiza- 
tion of a microarray constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion; and 

[0141] Fig. 18C is a summary table demonstrating detection of 
known microRNA oligonucleotides using a microarray 
constructed and operative in accordance with a preferred 
embodiment of the present invention. 
Brief Description of Sequences 

[0142] a Sequence Listing of genomic sequences of the present 
invention designated SEQ ID NO:l through SEQ ID: 
4,254,670 is attached to this application, and is hereby 
incorporated herein. The genomic listing comprises the 



following nucleotide sequences: nucleotide sequences of 
21,916 bacterial and 6,100 human GAM precursors of re- 
spective novel oligonucleotides of the present invention; 
nucleotide sequences of 32,713 bacterial and 11,428 hu- 
man GAM RNA oligonucleotides of respective novel 
oligonucleotides of the present invention; and nucleotide 
sequences of 1,507,219 target gene binding sites of re- 
spective novel oligonucleotides of the present invention. 
Detailed Description 

[0143] Reference is now made to Fig. 1, which is a simplified dia- 
gram describing a plurality of novel bioinformatically-de- 
tected oligonucleotide of the present invention referred to 
here as the Genomic Address Messenger (GAM) oligonu- 
cleotide, which modulates expression of respective target 
genes whose function and utility are known in the art. 

[0144] gam is a novel bioinformatically detectable regulatory, 
non-protein-coding, miRNA-like oligonucleotide. The 
method by which GAM is detected is described with addi- 
tional reference to Figs. 1-8. 

[0145] T he GAM PRECURSOR is preferably encoded by a bacterial 
genome. Alternatively or additionally, the GAM PRECUR- 
SOR is preferably encoded by the human genome. The 
GAM TARGET GENE is a gene encoded by the human 



genome. Alternatively or additionally, the GAM TARGET 
GENE is a gene encoded by a bacterial genome. 

[0 14 6] The GAM PRECURSOR encodes a GAM PRECURSOR RNA. 
Similar to other miRNA oligonucleotides, the GAM PRE- 
CURSOR RNA does not encode a protein. 

[0147] gam PRECURSOR RNA folds onto itself, forming GAM 

FOLDED PRECURSOR RNA, which has a two-dimensional 
"hairpin" structure. GAM PRECURSOR RNA folds onto itself, 
forming GAM FOLDED PRECURSOR RNA, which has a two- 
dimensional "hairpin structure". As is well-known in the 
art, this "hairpin structure" is typical of RNA encoded by 
known miRNA precursor oligonucleotides and is due to 
the full or partial complementarity of the nucleotide se- 
quence of the first half of an miRNA precursor to theRNA 
that is encoded by a miRNA oligonucleotide to the nu- 
cleotide sequence of the second half thereof. 

[0148] a complementary sequence is a sequence which is re- 
versed and wherein each nucleotide is replaced by a com- 
plementary nucleotide, as is well known in the art (e.g. 
ATGGC is the complementary sequence of GCCAT). 

[0149] An enzyme complex designated DICER COMPLEX, an en- 
zyme complex composed of Dicer RNaselll together with 
other necessary proteins, cuts the GAM FOLDED PRECUR- 



SOR RNA yielding a single-stranded -22 nt-long RNA 
segment designated GAM RNA. 
[0150] gam TARGET GENE encodes a corresponding messenger 
RNA, designated GAM TARGET RNA. As is typical of mRNA 
of a protein-coding gene, each GAM TARGET RNAs of the 
present invention comprises three regions, as is typical of 
mRNA of a protein-coding gene: a 5' untranslated region, 
a protein-coding region and a 3' untranslated region, 
designated 5'UTR, PROTEIN-CODING and 3'UTR, respec- 
tively. 

[0151] GAM RNA binds complementarily to one or more target 
binding sites located in the untranslated regions of each 
of the GAM TARGET RNAs of the present invention. This 
complementary binding is due to the partial or full com- 
plementarity between the nucleotide sequence of GAM 
RNA and the nucleotide sequence of each of the target 
binding sites. As an illustration, Fig. 1 shows three such 
target binding sites, designated BINDING SITE I, BINDING 
SITE II and BINDING SITE III, respectively. It is appreciated 
that the number of target binding sites shown in Fig. 1 is 
only illustrative and that any suitable number of target 
binding sites may be present. It is further appreciated that 
although Fig. 1 shows target binding sites only in the 



3'UTR region, these target binding sites may instead be 
located in the 5'UTR region or in both the 3'UTR and 
5'UTR regions. 

[0152] The complementary binding of GAM RNA to target binding 
sites on GAM TARGET RNA, such as BINDING SITE I, BIND- 
ING SITE II and BINDING SITE III, inhibits the translation of 
each of the GAM TARGET RNAs of the present invention 
into repsective GAM TARGET PROTEIN, shown surrounded 
by a broken line. 

[° 153 ] It is appreciated that the GAM TARGET GENE in fact repre- 
sents a plurality of GAM target genes. The mRNA of each 
one of this plurality of GAM target genes comprises one or 
more target binding sites, each having a nucleotide se- 
quence which is at least partly complementary to GAM 
RNA and which when bound by GAM RNA causes inhibi- 
tion of translation of the GAM target mRNA into a corre- 
sponding GAM target protein. 

[0154] The mechanism of the translational inhibition that is ex- 
erted by GAM RNA on one or more GAM TARGET GENEs 
may be similar or identical to the known mechanism of 
translational inhibition exerted by known miRNA oligonu- 
cleotides. 

[0155] The nucleotide sequences of each of a plurality of GAM 



oligonucleotides that are described by Fig. 1 and their re- 
spective genomic sources and genomic locations are set 
forth in Tables 1-3, hereby incorporated herein. 

[° 1 56] The nucleotide sequences of GAM PRECURSOR RNAs, and 
a schematic representation of a predicted secondary fold- 
ing of GAM FOLDED PRECURSOR RNAs, of each of a plural- 
ity of GAM oligonucleotides that are described by Fig. 1 
are set forth in Table 4, hereby incorporated herein. 

[0157] The nucleotide sequences of "diced" GAM RNAs of each of 
a plurality of GAM oligonucleotides that are described by 
Fig. 1 are set forth in Table 5, hereby incorporated herein. 

[0158] The nucleotide sequences of target binding sites, such as 
BINDING SITE I, BINDING SITE II and BINDING SITE III that 
are found on GAM TARGET RNAs of each of a plurality of 
GAM oligonucleotides that are described by Fig. 1, and a 
schematic representation of the complementarity of each 
of these target binding sites to each of a plurality of GAM 
RNAs that are described by Fig. 1 are set forth in Tables 
6-7, hereby incorporated herein. 

[0159] it is appreciated that the specific functions and accord- 
ingly the utilities of each of a plurality of GAM oligonu- 
cleotides that are described by Fig. 1 are correlated with 
and may be deduced from the identity of the GAM TARGET 



GENES inhibited thereby, and whose functions are set 
forth in Table 8, hereby incorporated herein. 
[0160] studies documenting the well known correlations between 
each of a plurality of GAM TARGET GENEs that are de- 
scribed by Fig. 1 and the known gene functions and re- 
lated diseases are listed in Table 9, hereby incorporated 
herein. 

[0161] The present invention discloses a novel group of bacterial 
and human oligonucleotides, belonging to the miRNA-like 
oligonucleotide group, here termed GAM oligonucleotides, 
for which a specific complementary binding has been de- 
termined bioinformatically. 

[0162] Reference is now made to Fig. 2, which is a simplified 

block diagram illustrating a bioinformatic oligonucleotide 
detection system and method constructed and operative 
in accordance with a preferred embodiment of the present 
invention. 

[0163] An important feature of the present invention is a bioin- 
formatic oligonucleotide detection engine 100, which is 
capable of bioinformatically detecting oligonucleotides of 
the present invention. 

[0164] The functionality of the bioinformatic oligonucleotide de- 
tection engine 100 includes receiving expressed RNA data 



102, sequenced DNA data 104, and protein function data 
106; performing a complex process of analysis of this 
data as elaborated hereinbelow, and based on this analy- 
sis provides information, designated by reference numeral 
108, identifying and describing features of novel oligonu- 
cleotides. 

[0165] Expressed RNA data 102 comprises published expressed 
sequence tags (EST) data, published mRNA data, as well as 
other published RNA data. Sequenced DNA data 104 com- 
prises alphanumeric data representing genomic se- 
quences and preferably including annotations such as in- 
formation indicating the location of known protein-coding 
regions relative to the genomic sequences. 

[0166] Protein function data 106 comprises information from sci- 
entific publications e.g. physiological functions of known 
proteins and their connection, involvement and possible 
utility in treatment and diagnosis of various diseases. 

[0167] Expressed RNA data 102 and sequenced DNA data 104 
may preferably be obtained from data published by the 
National Center for Biotechnology Informatiion (NCBI) at 
the National Institute of Health (NIH) QenuthJ.P. (2000). 
Methods Mol. Biol. 132:301-312(2000), herein incorpo- 
rated by reference) as well as from various other pub- 



lished data sources. Protein function data 106 may prefer- 
ably be obtained from any one of numerous relevant pub- 
lished data sources, such as the Online Mendelian Inher- 
ited Disease In Man (OMIM(TM), Hamosh et al., Nucleic 
Acids Res. 30: 52-55(2002)) database developed by John 
Hopkins University, and also published by NCBI (2000). 

[0168] p r j or to or during actual detection of bioinformatically-de- 
tected group of novel oligonucleotides 108 by the bioin- 
formatic oligonucleotide detection engine 100, bioinfor- 
matic oligonucleotide detection engine training & valida- 
tion functionality 110 is operative. This functionality uses 
one or more known miRNA oligonucleotides as a training 
set to train the bioinformatic oligonucleotide detection 
engine 100 to bioinformatically recognize miRNA-like 
oligonucleotides, and their respective potential target 
binding sites. Bioinformatic oligonucleotide detection en- 
gine training & validation functionality 110 is further de- 
scribed hereinbelow with reference to Fig. 3. 

[0169] The bioinformatic oligonucleotide detection engine 100 
preferably comprises several modules which are prefer- 
ably activated sequentially, and are described as follows: 

[0170] a non-protein-coding genomic sequence detector 112 
operative to bioinformatically detect non-protein-coding 



genomic sequences. The non-protein-coding genomic se- 
quence detector 112 is further described herein below 
with reference to Figs. 4A and 4B. 

[0171] a hairpin detector 114 operative to bioinformatically de- 
tect genomic "hairpin-shaped" sequences, similar to GAM 
FOLDED PRECURSOR RNA (Fig. 1). The hairpin detector 
114 is further described herein below with reference to 
Figs. 5A and 5B. 

[0172] a Dicer-cut location detector 116 operative to bioinfor- 
matically detect the location on a GAM FOLDED PRECUR- 
SOR RNA which is enzymatically cut by DICER COMPLEX 
(Fig. 1), yielding "diced" GAM RNA. The Dicer-cut location 
detector 116 is further described herein below with refer- 
ence to Figs. 6A-6C. 

[0173] a target gene binding site detector 118 operative to 

bioinformatically detect target genes having binding sites, 
the nucleotide sequence of which is partially complemen- 
tary to that of a given genomic sequence, such as a nu- 
cleotide sequence cut by DICER COMPLEX. The target gene 
binding site detector 118 is further described hereinbelow 
with reference to Figs. 7A and 7B. 

[0174] a function & utility analyzer, designated by reference nu- 
meral 120, is operative to analyze the function and utility 



of target genes in order to identify target genes which 
have a significant clinical function and utility. The function 
& utility analyzer 120 is further described hereinbelow 
with reference to Fig. 8 
[0175] According to an embodiment of the present invention, the 
bioinformatic oligonucleotide detection engine 100 may 
employ a cluster of 40 personal computers (PCs; XEON (R), 
2.8GHz, with 80GB storage each) connected by Ethernet to 
eight servers (2-CPU, XEON (TM) 1.2-2.2GHz, with 
-200GB storage each) and combined with an 8-processor 
server (8-CPU, Xeon 550Mhz w/ 8GB RAM) connected via 
2 HBA fiber-channels to an EMC CLARIION (TM) 
100-disks, 3.6 Terabyte storage device. A preferred em- 
bodiment of the present invention may also preferably 
comprise software that utilizes a commercial database 
software program, such as MICROSOFT (TM) SQL Server 
2000. 

[0176] According to a preferred embodiment of the present in- 
vention, the bioinformatic oligonucleotide detection en- 
gine 100 may employ a cluster of 80 Servers (XEON (R), 
2.8GHz, with 80GB storage each) connected by Ethernet to 
eight servers (2-CPU, XEON (TM) 1.2-2.2GHz, with 
~200GB storage each) and combined with storage device 



(Promise Technology Inc., RM8000) connected to an 
8-disks, 2 Terabytes total. A preferred embodiment of the 
present invention may also preferably comprise software 
that utilizes a commercial database software program, 
such as MICROSOFT (TM) SQL Server 2000. It is appreci- 
ated that the abovementioned hardware configuration is 
not meant to be limiting and is given as an illustration 
only. The present invention may be implemented in a wide 
variety of hardware and software configurations. 

[0177] The present invention discloses 21,916 bacterial and 

6,100 human novel oligonucleotides of the GAM group of 
oligonucleotides, which have been detected bioinformati- 
cally and 6,056 bacterial and 430 novel polynucleotides of 
the GR group of polynucleotides, which have been de- 
tected bioinformatically. Laboratory confirmation of bioin- 
formatically predicted oligonucleotides of the GAM group 
of oligonucleotides, and several bioinformatically pre- 
dicted polynucleotides of the GR group of polynu- 
cleotides, is described hereinbelow with reference to Figs. 
13-15D, Fig. 18 and Table 12. 

[0178] Reference is now made to Fig. 3, which is a simplified 

flowchart illustrating operation of a preferred embodiment 
of the bioinformatic oligonucleotide detection engine 



training & validation functionality 110 described herein- 
above with reference to Fig. 2. 

[0179] bioinformatic oligonucleotide detection engine training & 
validation functionality 110 begins by training the bioin- 
formatic oligonucleotide detection engine 100 (Fig. 2) to 
recognize one or more known miRNA oligonucleotides, as 
designated by reference numeral 122. This training step 
comprises hairpin detector training & validation function- 
ality 124, further described hereinbelow with reference to 
Fig. 5A, Dicer-cut location detector training & validation 
functionality 126, further described hereinbelow with ref- 
erence to Fig. 6A and 6B, and target gene binding site de- 
tector training & validation functionality 128, further de- 
scribed hereinbelow with reference to Fig. 7A. 

[0180] Next, the bioinformatic oligonucleotide detection engine 
training & validation functionality 110 is operative bioin- 
formatically detect novel oligonucleotides, using bioinfor- 
matic oligonucleotide detection engine 100 (Fig. 2), as 
designated by reference numeral 130. Wet lab experi- 
ments are preferably conducted in order to validate ex- 
pression and preferably function of some samples of the 
novel oligonucleotides detected by the bioinformatic 
oligonucleotide detection engine 100, as designated by 



reference numeral 132. Figs. 13A-15D, Fig. 18 and Table 
12 illustrate examples of wet lab validation of sample 
novel human oligonucleotides bioinformatically-detected 
in accordance with a preferred embodiment of the present 
invention. 

[0181] Reference is now made to Fig. 4A, which is a simplified 
block diagram of a preferred implementation of the non- 
protein-coding genomic sequence detector 112 described 
hereinabove with reference to Fig. 2. The non-pro- 
tein-coding genomic sequence detector 112 preferably 
receives at least two types of published genomic data: Ex- 
pressed RNA data 102 and sequenced DNA data 104. The 
expressed RNA data 102 may include, inter alia, EST data, 
EST clusters data, EST genome alignment data and mRNA 
data. Sources for expressed RNA data 102 include NCBI 
dbEST, NCBI UniGene clusters and mapping data, and TIGR 
gene indices (Kirkness F. and Kerlavage, A.R., Methods 
Mol. Biol. 69:261-268 (1997)). Sequenced DNA data 104 
may include sequence data (FASTA format files), and fea- 
ture annotations (GenBank file format) mainly from NCBI 
databases. Based on the abovementioned input data, the 
non-protein-coding genomic sequence detector 112 pro- 
duces a plurality of non-protein-coding genomic se- 



quences 136. Preferred operation of the non-pro- 
tein-coding genomic sequence detector 112 is described 
hereinbelow with reference to Fig. 4B. 
[0182] Reference is now made to Fig. 4B, which is a simplified 
flowchart illustrating a preferred operation of the non- 
protein-coding genomic sequence detector 112 of Fig. 2. 
Detection of non-protein-coding genomic sequences 136, 
generally preferably progresses along one of the following 
two paths: 

[0183] a first path for detecting non-protein-coding genomic 

sequences 136 (Fig. 4A) begins with receipt of a plurality 
of known RNA sequences, such as EST data. Each RNA se- 
quence is first compared with known protein-coding DNA 
sequences, in order to select only those RNA sequences 
which are non-protein-coding, i.e. intergenic or intronic 
sequences. This can preferably be performed by using one 
of many alignment algorithms known in the art, such as 
BLAST (Altschul et al.J. Mol. Biol. 215:403-410 (1990)). 
This sequence comparison preferably also provides local- 
ization of the RNA sequence on the DNA sequences. 

[0184] Alternatively, selection of non-protein-coding RNA se- 
quences and their localization on the DNA sequences can 
be performed by using publicly available EST cluster data 



and genomic mapping databases, such as the UNIGENE 
database published by NCBI or the TIGR database. Such 
databases, map expressed RNA sequences to DNA se- 
quences encoding them, find the correct orientation of 
EST sequences, and indicate mapping of ESTs to protein- 
coding DNA regions, as is well known in the art. Public 
databases, such as TIGR, may also be used to map an EST 
to a cluster of ESTs, known in the art as Tentative Human 
Consensus and assumed to be expressed as one segment. 
Publicly available genome annotation databases, such as 
NCBI's GenBank, may also be used to deduce expressed 
intronic sequences. 

[0185] Optionally, an attempt may be made to "expand" the non- 
protein RNA sequences thus found, by searching for tran- 
scription start and end signals, respectively upstream and 
downstream of the location of the RNA on the DNA, as is 
well known in the art. 

[0186] a second path for detecting non-protein-coding genomic 
sequences 136 (Fig. 4A) begins with receipt of DNA se- 
quences. The DNA sequences are parsed into non- 
protein-coding sequences, using published DNA annota- 
tion data, by extracting those DNA sequences which are 
between known protein-coding sequences. Next, tran- 



scription start and end signals are sought. If such signals 
are found, and depending on their robustness, probable 
expressed non-protein-coding genomic sequences are 
obtained. Such approach is especially useful for identify- 
ing novel GAM oligonucleotides which are found in prox- 
imity to other known miRNA oligonucleotides, or other 
wet lab validated GAM oligonucleotides. Since, as de- 
scribed hereinbelow with reference to Fig. 9, GAM 
oligonucleotides are frequently found in clusters; se- 
quences located near known miRNA oligonucleotides are 
more likely to contain novel GAM oligonucleotides. Op- 
tionally, sequence orthology, i.e. sequence conservation in 
an evolutionary related species, may be used to select ge- 
nomic sequences having a relatively high probability of 
containing expressed novel GAM oligonucleotides. It is 
appreciated that in detecting non-human GAM oligonu- 
cleotides of the present invention the bioinformatic 
oligonucleotide detection engine 100 utilizes the input 
genomic sequences, without filtering protein-coding re- 
gions detected by the non-protein-coding genomic se- 
quence detector 112, hence non-protein-coding genomic 
sequences 136 refers to GENOMIC SEQUENCES only. 
[0187] Reference is now made to Fig. 5A, which is a simplified 



block diagram of a preferred implementation of the hair- 
pin detector 114 described hereinabove with reference to 
Fig. 2. 

[0188] The goal of the hairpin detector 114 is to detect hairpin- 
shaped genomic sequences, similar to those of known 
miRNA oligonucleotides. A hairpin-shaped genomic se- 
quence is a genomic sequence, having a first half which is 
at least partially complementary to a second half thereof, 
which causes the halves to folds onto themselves, thereby 
forming a hairpin structure, as mentioned hereinabove 
with reference to Fig. 1. 

[0189] The hairpin detector 114 (Fig. 2) receives a plurality of 
non-protein-coding genomic sequences 136 (Fig. 4A). 
Following operation of hairpin detector training & valida- 
tion functionality 124 (Fig. 3), the hairpin detector 114 is 
operative to detect and output hairpin-shaped sequences, 
which are found in the non-protein-coding genomic se- 
quences 136. The hairpin-shaped sequences detected by 
the hairpin detector 114 are designated hairpin structures 
on genomic sequences 138. A preferred mode of opera- 
tion of the hairpin detector 114 is described hereinbelow 
with reference to Fig. 5B. 

[0190] hairpin detector training & validation functionality 124 in- 



eludes an iterative process of applying the hairpin detec- 
tor 114 to known hairpin-shaped miRNA precursor se- 
quences, calibrating the hairpin detector 114 such that it 
identifies a training set of known hairpin-shaped miRNA 
precursor sequences, as well as other similarly hairpin- 
shaped sequences. In a preferred embodiment of the 
present invention, the hairpin detector training & valida- 
tion functionality 124 trains the hairpin detector 114 and 
validates each of the steps of operation thereof described 
hereinbelow with reference to Fig. 5B 
[0191] The hairpin detector training & validation functionality 

124 preferably uses two sets of data: the aforesaid train- 
ing set of known hairpin-shaped miRNA precursor se- 
quences, such as hairpin-shaped miRNA precursor se- 
quences of 440 miRNA oligonucleotides of H. sapiens, M. 
musculus, C. elegans, C. Brigssae and D. Melanogaster, 
annotated in the RFAM database (Griffiths-Jones 2003), 
and a background set of about 1000 hairpin-shaped se- 
quences found in expressed non-protein-coding human 
genomic sequences. The background set is expected to 
comprise some valid, previously undetected hairpin- 
shaped miRNA-like precursor sequences, and many hair- 
pin-shaped sequences which are not hairpin-shaped 



miRNA-like precursors. 

[° 192 ] In a preferred embodiment of the present invention the 
efficacy of the hairpin detector 114 (Fig. 2) is confirmed. 
For example, when a similarity threshold is chosen such 
that 87% of the known hairpin-shaped miRNA precursors 
are successfully predicted, only 21.8% of the 1000 back- 
ground set of hairpin-shaped sequences are predicted to 
be hairpin-shaped miRNA-like precursors. 

[0193] Reference is now made to Fig. 5B, which is a simplified 
flowchart illustrating preferred operation of the hairpin 
detector 114 of Fig. 2. The hairpin detector 114 preferably 
initially uses a secondary structure folding algorithm 
based on free-energy minimization, such as the MFOLD 
algorithm, described in Mathews et al. J. Mol. Biol. 
288:911-940 (1999) and Zuker, M. Nucleic Acids Res. 31: 
3406-3415 (2003), the disclosure of which is hereby in- 
corporated by reference. This algorithm is operative to 
calculate probable secondary structure folding patterns of 
the non-protein-coding genomic sequences 136 (Fig. 4A) 
as well as the free-energy of each of these probable sec- 
ondary folding patterns. The secondary structure folding 
algorithm, such as the MFOLD algorithm (Mathews, 1997; 
Zuker 2003), typically provides a listing of the base- 



pairing of the folded shape, i.e. a listing of each pair of 
connected nucleotides in the sequence. 

[0194] Next, the hairpin detector 114 analyzes the results of the 
secondary structure folding patterns, in order to deter- 
mine the presence and location of hairpin folding struc- 
tures. The goal of this second step is to assess the base- 
pairing listing provided by the secondary structure folding 
algorithm, in order to determine whether the base-pairing 
listing describes one or more hairpin type bonding pat- 
tern. Preferably, sequence segment corresponding to a 
hairpin structure is then separately analyzed by the sec- 
ondary structure folding algorithm in order to determine 
its exact folding pattern and free-energy. 

[0195] jhe hairpin detector 114 then assesses the hairpin struc- 
tures found by the previous step, comparing them to hair- 
pin structures of known miRNA precursors, using various 
characteristic hairpin structure features such as its free- 
energy and its thermodynamic stability, the amount and 
type of mismatched nucleotides and the existence of se- 
quence repeat-elements, number of mismatched nu- 
cleotides in positions 18-22 counting from loop, and Per- 
cent of G nucleotide. Only hairpins that bear statistically 
significant resemblance to the training set of hairpin 



structures of known miRNA precursors, according to the 
abovementioned parameters, are accepted. 

[0196] | n a preferred embodiment of the present invention, simi- 
larity to the training set of hairpin structures of known 
miRNA precursors is determined using a "similarity score" 
which is calculated using a multiplicity of terms, where 
each term is a function of one of the abovementioned 
hairpin structure features. The parameters of each func- 
tion are found heuristically from the set of hairpin struc- 
tures of known miRNA precursors, as described herein- 
above with reference to hairpin detector training & valida- 
tion functionality 124 (Fig. 3). The selection of the fea- 
tures and their function parameters is optimized so as to 
achieve maximized separation between the distribution of 
similarity scores validated miRNA precursor hairpin struc- 
tures, and the distribution of similarity scores of hairpin 
structures detected in the background set mentioned 
hereinabove with reference to Fig. 5B. 

[° 197 ] In an alternative preferred embodiment of the present in- 
vention, the step described in the preceding paragraph 
may be split into two stages. A first stage implements a 
simplified scoring method, typically based on threshold- 
ing a subset of the hairpin structure features described 



hereinabove, and may employ a minimum threshold for 
hairpin structure length and a maximum threshold for 
free-energy. A second stage is preferably more stringent, 
and preferably employs a full calculation of the weighted 
sum of terms described hereinabove. The second stage 
preferably is performed only on the subset of hairpin 
structures that survived the first stage. 

[0198] The hairpin detector 114 also attempts to select hairpin 
structures whose thermodynamic stability is similar to 
that of hairpin structures of known miRNA precursors. 
This may be achieved in various ways. A preferred em- 
bodiment of the present invention utilizes the following 
methodology, preferably comprising three logical steps: 

[0199] First, the hairpin detector 114 attempts to group hairpin 
structures into "families" of closely related hairpin struc- 
tures. As is known in the art, a secondary structure fold- 
ing algorithm typically provides multiple alternative fold- 
ing patterns, for a given genomic sequence and indicates 
the free-energy of each alternative folding pattern. It is a 
particular feature of the present invention that the hairpin 
detector 114 preferably assesses the various hairpin 
structures appearing in the various alternative folding 
patterns and groups' hairpin structures which appear at 



identical or similar sequence locations in various alterna- 
tive folding patterns into common sequence location 
based "families" of hairpins. For example, all hairpin 
structures whose center is within 7 nucleotides of each 
other may be grouped into a "family". Hairpin structures 
may also be grouped into a "family" if their nucleotide se- 
quences are identical or overlap to a predetermined de- 
gree. 

[0200] it is also a particular feature of the present invention that 
the hairpin structure "families" are assessed in order to 
select only those families which represent hairpin struc- 
tures that are as thermodynamically stable as those of 
hairpin structures of known miRNA precursors. Preferably 
only families which are represented in at least a selected 
majority of the alternative secondary structure folding 
patterns, typically 65%, 80% or 100% are considered to be 
sufficiently stable. Our tests suggest that only about 50% 
of the hairpin structures, predicted by the MFOLD algo- 
rithm with default parameters, are members of sufficiently 
stable families, comparing to about 90% of the hairpin 
structures that contain known miRNAs. This percent de- 
pends on the size of the fraction that was fold. In an alter- 
native embodiment of the present invention we use frac- 



tions of size 1000 nts as preferable size. Different em- 
bodiment uses other sizes of genomics sequences, more 
or less strict demand for representation in the alternative 
secondary structure folding patterns. 
[0201] it is an additional particular feature of the present inven- 
tion that the most suitable hairpin structure is selected 
from each selected family. For example, a hairpin struc- 
ture which has the greatest similarity to the hairpin struc- 
tures appearing in alternative folding patterns of the fam- 
ily may be preferred. Alternatively or additionally, the 
hairpin structures having relatively low free-energy may 
be preferred. 

[0202] Alternatively or additionally considerations of homology to 
hairpin structures of other organisms and the existence of 
clusters of thermodynamically stable hairpin structures 
located adjacent to each other along a sequence may be 
important in selection of hairpin structures. The tightness 
of the clusters in terms of their location and the occur- 
rence of both homology and clusters may be of signifi- 
cance. 

[0203] Reference is now made to Figs. 6A-6C, which together 
describe the structure and operation of the Dicer-cut lo- 
cation detector 116, described hereinabove with reference 



to Fig. 2. 

[0204] Reference is now made to Fig. 6A, which is a simplified 

block diagram of a preferred implementation of the Dicer- 
cut location detector 116. The goal of the Dicer-cut loca- 
tion detector 116 is to detect the location in which the 
DICER COMPLEX, described hereinabove with reference to 
Fig. 1, dices GAM FOLDED PRECURSOR RNA, yielding GAM 
RNA. 

[0205] The Dicer-cut location detector 116 therefore receives a 

plurality of hairpin structures on genomic sequences, des- 
ignated by reference numeral 138 (Fig. 5A), and following 
operation of Dicer-cut location detector training & valida- 
tion functionality 126 (Fig 3), is operative to detect a plu- 
rality of Dicer-cut sequences from hairpin structures, des- 
ignated by reference numeral 140. 

[0206] Reference is now made to Fig. 6B, which is a simplified 

flowchart illustrating a preferred implementation of Dicer- 
cut location detector training & validation functionality 
126. 

[0207] a general goal of the Dicer-cut location detector training 
& validation functionality 126 is to analyze the Dicer-cut 
locations of known diced miRNA on respective hairpin- 
shaped miRNA precursors in order to determine a com- 



mon pattern in these locations, which can be used to pre- 
dict Dicer-cut locations on GAM folded precursor RNAs. 
[0208] The Dicer-cut locations of known miRNA precursors are 
obtained and studied. Locations of the 5' and/or 3' ends 
of the known diced miRNA oligonucleotides are preferably 
represented by their respective distances from the 5' end 
of the corresponding hairpin-shaped miRNA precursor. 
Additionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides are preferably rep- 
resented by the relationship between their locations and 
the locations of one or more nucleotides along the hair- 
pin-shaped miRNA precursor. Additionally or alternatively, 
the 5' and/or 3' ends of the known diced miRNA oligonu- 
cleotides are preferably represented by the relationship 
between their locations and the locations of one or more 
bound nucleotide pairs along the hairpin-shaped miRNA 
precursor. Additionally or alternatively, the 5' and/or 3' 
ends of the known diced miRNA oligonucleotides are 
preferably represented by the relationship between their 
locations and the locations of one or more mismatched 
nucleotide pairs along the hairpin-shaped miRNA precur- 
sor. Additionally or alternatively, the 5' and/or 3' ends of 
the known diced miRNA oligonucleotides are preferably 



represented by the relationship between their locations 
and the locations of one or more unmatched nucleotides 
along the hairpin-shaped miRNA precursor. Additionally 
or alternatively, locations of the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides are preferably rep- 
resented by their respective distances from the loop lo- 
cated at the center of the corresponding hairpin-shaped 
miRNA precursor. 

[0209] one or more of the foregoing location metrics may be 
employed in the Dicer-cut location detector training & 
validation functionality 126. Additionally, metrics related 
to the nucleotide content of the diced miRNA and/or of 
the hairpin-shaped miRNA precursor may be employed. 

[0210] | n a preferred embodiment of the present invention, 

Dicer-cut location detector training & validation function- 
ality 126 preferably employs standard machine learning 
techniques known in the art of machine learning to ana- 
lyze existing patterns in a given "training set" of exam- 
ples. Standard machine learning techniques are capable, 
to a certain degree, of detecting patterns in examples to 
which they have not been previously exposed that are 
similar to those in the training set. Such machine learning 
techniques include, but are not limited to neural net- 



works, Bayesian Modeling, Bayesian Networks, Support 
Vector Machines (SVM), Genetic Algorithms, Markovian 
Modeling, Maximum Likelihood Modeling, Nearest Neigh- 
bor Algorithms, Decision Trees and other techniques, as is 
well-known in the art. 
[° 21 1 ] In accordance with an embodiment of the present inven- 
tion, two or more classifiers or predictors based on the 
abovementioned machine learning techniques are sepa- 
rately trained on the abovementioned training set, and are 
used jointly in order to predict the Dicer-cut location. As 
an example, Fig. 6B illustrates operation of two classifiers, 
a 3' end recognition classifier and a 5' end recognition 
classifier. Most preferably, the Dicer-cut location detector 
training & validation functionality 126 implements a "best- 
of-breed" approach employing a pair of classifiers based 
on the abovementioned Bayesian Modeling and Nearest 
Neighbor Algorithms, and accepting only "potential GAM 
RNAs" that score highly on one of these predictors. In this 
context, "high scores" means scores that have been 
demonstrated to have low false positive value when scor- 
ing known miRNA oligonucleotides. Alternatively, the 
Dicer-cut location detector training & validation function- 
ality 126 may implement operation of more or less than 



two classifiers. 

[0212] Predictors used in a preferred embodiment of the present 
invention are further described hereinbelow with reference 
to Fig. 6C. A computer program listing of a computer pro- 
gram implementation of the Dicer-cut location detector 
training & validation functionality 126 is enclosed on an 
electronic medium in computer-readable form, and is 
hereby incorporated by reference herein. 

[0213] when evaluated on the abovementioned validation set of 
440 published miRNA oligonucleotides using k-fold cross 
validation (Mitchell, 1997) with k = 3, the performance of 
the resulting predictors is as follows: In 70% of known 
miRNA oligonucleotides, a 5' end location is correctly de- 
termined by a Support Vector Machine predictor within up 
to two nucleotides; a Nearest Neighbor (EDIT DISTANCE) 
predictor achieves 56% accuracy (247/440); and a Two- 
Phased Predictor that uses Bayesian modeling (TWO 
PHASED) achieves 80% accuracy (352/440) when only the 
first phase is used. When the second phase (strand choice) 
is implemented by a naive Bayesian model, the accuracy is 
55% (244/440), and when the K-nearest-neighbor model- 
ing is used for the second phase, 374/440 decisions are 
made and the accuracy is 65% (242/374). A K-near- 



est-neighbor predictor (FIRST-K) achieves 61% accuracy 
(268/440). The accuracies of all predictors are consider- 
ably higher on top-scoring subsets of published miRNA 
oligonucleotides. 

[0214] Finally, in order to validate the efficacy and accuracy of 
the Dicer-cut location detector 116, a sample of novel 
oligonucleotides detected thereby is preferably selected, 
and validated by wet lab experiments. Laboratory results 
validating the efficacy of the Dicer-cut location detector 
116 are described hereinbelow with reference to Figs. 
13-15D, Fig. 18 and also in the enclosed file Table 12. 

[0215] Reference is now made to Fig. 6C, which is a simplified 
flowchart illustrating an operation of a Dicer-cut location 
detector 116 (Fig. 2), constructed and operative in accor- 
dance with a preferred embodiment of the present inven- 
tion. The Dicer-cut location detector 116 preferably com- 
prises a machine learning computer program module, 
which is trained to recognize Dicer-cut locations on 
known hairpin-shaped miRNA precursors, and based on 
this training, is operable to detect Dicer-cut locations of 
novel GAM RNA (Fig. 1) on GAM FOLDED PRECURSOR RNA 
(Fig. 1). In a preferred embodiment of the present inven- 
tion, the Dicer-cut location module preferably utilizes 



machine learning algorithms, including but not limited to 
Support Vector Machine, Bayesian modeling, Nearest 
Neighbors, and K-nearest-neighbor algorithms that are 
known in the art. 

[0216] when initially assessing a novel GAM FOLDED PRECURSOR 
RNA, each 19-24 nt-long segment thereof is considered 
to be a potential GAM RNA, because the Dicer-cut location 
is initially unknown. 

[0217] For each such potential GAM RNA, the location of its 5' 
end or the locations of its 5' and 3' ends are scored by at 
least one recognition classifier or predictor, operating on 
features such as the follwing: Locations of the 5' and/or 3' 
ends of the known diced miRNA oligonucleotides, which 
are preferably represented by their respective distances 
from the 5' end of the corresponding hairpin-shaped 
miRNA precursor. Additionally or alternatively, the 5' and/ 
or 3' ends of the known diced miRNA oligonucleotides, 
which are preferably represented by the relationship be- 
tween their locations and the locations of one or more nu- 
cleotides along the hairpin-shaped miRNA precursor. Ad- 
ditionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 



tions and the locations of one or more bound nucleotide 
pairs along the hairpin-shaped miRNA precursor. Addi- 
tionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 
tions and the locations of one or more mismatched nu- 
cleotide pairs along the hairpin-shaped miRNA precursor. 
Additionally or alternatively, the 5' and/or 3' ends of the 
known diced miRNA oligonucleotides, which are prefer- 
ably represented by the relationship between their loca- 
tions and the locations of one or more unmatched nu- 
cleotides along the hairpin-shaped miRNA precursor. Ad- 
ditionally or alternatively, locations of the 5' and/or 3' 
ends of the known diced miRNA oligonucleotides, which 
are preferably represented by their respective distances 
from the loop located at the center of the corresponding 
hairpin-shaped miRNA precursor. Additionally or alterna- 
tively, metrics related to the nucleotide content of the 
diced miRNA and/or of the hairpin-shaped miRNA precur- 
sor. 

[0218] | n a preferred embodiment of the present invention, the 

Dicer-cut location detector 116 (Fig. 2) may use a Support 
Vector Machine predictor. 



[° 219 ] In another preferred embodiment of the present inven- 
tion, the Dicer-cut location detector 116 (Fig. 2) prefer- 
ably employs an "EDIT DISTANCE" predictor, which seeks 
sequences that are similar to those of known miRNA 
oligonucleotides, utilizing a Nearest Neighbor algorithm, 
where a similarity metric between two sequences is a vari- 
ant of the Edit Distance algorithm (Gusfield, 1997). The 
EDIT DISTANCE predictor is based on an observation that 
miRNA oligonucleotides tend to form clusters, the mem- 
bers of which show marked sequence similarity. 

[0220] | n y e t another preferred embodiment of the present in- 
vention, the Dicer-cut location detector 116 (Fig. 2) 
preferably uses a "TWO PHASE" predictor, which predicts 
the Dicer-cut location in two distinct phases: (a) selecting 
a double-stranded segment of the CAM FOLDED PRECUR- 
SOR RNA (Fig. 1) comprising the CAM RNA by naive 
Bayesian modeling and (b) detecting which strand of the 
double-stranded segment contains CAM RNA (Fig. 1) by 
employing either naive or K-nearest-neighbor modeling. 
K-nearest-neighbor modeling is a variant of the "FIRST-K" 
predictor described hereinbelow, with parameters opti- 
mized for this specific task. The "TWO PHASE" predictor 
may be operated in two modes: either utilizing only the 



first phase and thereby producing two alternative Dicer- 
cut location predictions, or utilizing both phases and 
thereby producing only one final Dicer-cut location. 

[° 221 ] In still another preferred embodiment of the present in- 
vention, the Dicer-cut location detector 116 preferably 
uses a "FIRST-K" predictor, which utilizes a K-near- 
est-neighbor algorithm. The similarity metric between any 
two sequences is 1- E/L, where L is a parameter, prefer- 
ably 8-10 and E is the edit distance between the two se- 
quences, taking into account only the first L nucleotides 
of each sequence. If the K-nearest-neighbor scores of two 
or more locations on the GAM FOLDED PRECURSOR RNA 
(Fig. 1) are not significantly different, these locations are 
further ranked by a Bayesian model, similar to the one de- 
scribed hereinabove. 

[0222] | n accordance with an embodiment of the present inven- 
tion, scores of two or more of the abovementioned classi- 
fiers or predictors are integrated, yielding an integrated 
score for each potential GAM RNA. As an example, Fig. 6C 
illustrates an integration of scores from two classifiers, a 
3' end recognition classifier and a 5' end recognition clas- 
sifier, the scores of which are integrated to yield an inte- 
grated score. Most preferably, the INTEGRATED SCORE of 



Fig. 6C preferably implements a "best-of-breed" approach 
employing a pair of classifiers and accepting only "poten- 
tial GAM RNAs" that score highly on one of the abovemen- 
tioned "EDIT DISTANCE" or "TWO PHASE" predictors. In 
this context, "high scores" means scores that have been 
demonstrated to have low false positive value when scor- 
ing known miRNA oligonucleotides. Alternatively, the IN- 
TEGRATED SCORE may be derived from operation of more 
or less than two classifiers. 

[0223] The INTEGRATED SCORE is evaluated as follows: (a) the 
"potential GAM RNA" having the highest score is prefer- 
ably taken to be the most probable GAM RNA, and (b) if 
the integrated score of this most probable GAM RNA is 
higher than a pre-defined threshold, then the most prob- 
able GAM RNA is accepted as a PREDICTED GAM RNA. 
Preferably, this evaluation technique is not limited to the 
highest scoring potential GAM RNA. 

[0224] | n a preferred embodiment of the present invention, PRE- 
DICTED GAM RNAs comprising a low complexity nu- 
cleotide sequence (e.g., ATATATA) may optionally be fil- 
tered out, because there is a high probability that they are 
part of a repeated element in the DNA, and are therefore 
not functional, as is known in the art. For each PREDICTED 



GAM RNA sequence, the number of occurrences of each 
two nt combination (AA, AT, AC) comprised in that se- 
quence is counted. PREDICTED GAM RNA sequences where 
the sum of the two most probable combinations is higher 
than a threshold, preferably 8-10, are filtered out. As an 
example, when the threshold is set such that 2% of the 
known miRNA oligonucleotides are filtered out, 30% of the 
predicted GAM RNAs are filtered out. 

[0225] Reference is now made to Fig. 7A, which is a simplified 

block diagram of a preferred implementation of the target 
gene binding site detector 118 described hereinabove 
with reference to Fig. 2. The goal of the target gene bind- 
ing site detector 118 is to detect one or more binding 
sites located in 3'UTRs of the mRNA of a known gene, 
such as BINDING SITE I, BINDING SITE II and BINDING SITE 
III (Fig. 1), the nucleotide sequence of which binding sites 
is partially or fully complementary to a GAM RNA, thereby 
determining that the abovementioned known gene is a 
target gene of the GAM RNA. 

[0226] The target gene binding site detector 118 (Fig. 2) receives 
a plurality of Dicer-cut sequences from hairpin structures 
140 (Fig. 6A) and a plurality of potential target gene se- 
quences 142, which are derived from sequenced DNA data 



104 (Fig. 2). 

[0227] The target gene binding site detector training & validation 
functionality 128 (Fig. 3) is operative to train the target 
gene binding site detector 118 on known miRNA oligonu- 
cleotides and their respective target genes and to build a 
background model for an evaluation of the probability of 
achieving similar results randomly (P value) for the target 
gene binding site detector 118 results. The target gene 
binding site detector training & validation functionality 
128 constructs the model by analyzing both heuristically 
and computationally the results of the target gene binding 
site detector 118. 

[0228] Following operation of target gene binding site detector 
training & validation functionality 128 (Fig. 3), the target 
gene binding site detector 118 is operative to detect a 
plurality of potential novel target genes having binding 
site/s 144, the nucleotide sequence of which is partially or 
fully complementary to that of each of the plurality of 
Dicer-cut sequences from hairpin structures 140. Pre- 
ferred operation of the target gene binding site detector 
118 is further described hereinbelow with reference to 
Fig. 7B. 

[0229] Reference is now made to Fig. 7B, which is a simplified 



flowchart illustrating a preferred operation of the target 
gene binding site detector 118 of Fig. 2. 

[0230] | n an embodiment of the present invention, the target 

gene binding site detector 118 first compares nucleotide 
sequences of each of the plurality of Dicer-cut sequences 
from hairpin structures 140 (Fig. 6A) to the potential tar- 
get gene sequences 142 (Fig. 7 A), such as 3' side UTRs of 
known mRNAs, in order to find crude potential matches. 
This step may be performed using a simple alignment al- 
gorithm such as BLAST. 

[0231] Then, the target gene binding site detector 118 filters 
these crude potential matches, to find closer matches, 
which more closely resemble published miRNA oligonu- 
cleotide binding sites. 

[0232] Next, the target gene binding site detector 118 expands 
the nucleotide sequences of the 3'UTR binding site found 
by the sequence comparison algorithm (e.g. BLAST or EDIT 
DISTANCE). A determination is made whether any sub- 
sequence of the expanded sequence may improve the 
match. The best match is considered the alignment. 

[0233] Free-energy and spatial structure are computed for the 
resulting binding sites. Calculation of spatial structure 
may be performed by a secondary structure folding algo- 



rithm based on free-energy minimization, such as the 
MFOLD algorithm described in Mathews et al. Q. Mol. Biol. 
288: 911-940 (1999)) and Zuker (Nucleic Acids Res. 31: 
3406-3415 (2003)), the disclosure of which is hereby in- 
corporated by reference. Free-energy, spatial structure 
and the above preferences are reflected in scoring. The 
resulting scores are compared with scores characterstic of 
known binding sites of published miRNA oligonucleotides, 
and each binding site is given a score that reflects its re- 
semblance to these known binding sites. 
[0234] Finally, the target gene binding site detector 118 analyzes 
the spatial structure of the binding site. Each 3'UTR-GAM 
oligonucleotide pair is given a score. Multiple binding 
sites of the same GAM oligonucleotides to a 3'UTR are 
given higher scores than those that bind only once to a 
3'UTR. 

[0235] | n a preferred embodiment of the present invention, per- 
formance of the target gene binding site detector 118 
may be improved by integrating several of the abovemen- 
tioned logical steps, using the methodology described 
hereinbelow. 

[0236] For each of the Dicer-cut sequence from hairpin struc- 
tures 140, its starting segment, e.g. a segment compris- 



ing the first 8 nts from its 5' end, is obtained. For each 
starting segment, all of the 9 nt segments that are highly 
complementary to the starting segment are calculated. 
These calculated segments are referred to here as "poten- 
tial binding site end segments". In a preferred embodi- 
ment of the present invention, for each 8 nt starting seg- 
ment, the potential binding site end segments are all 9 nt 
segments whose complementary sequence contains a 7-9 
nt sub-sequence that is not different from the starting 
segment by more than an insertion, deletion or replace- 
ment of one nt. Calculation of potential binding site end 
segments is preferably performed by a pre-processing 
tool that maps all possible 8 nt segments to their respec- 
tive 9 nt segments. 
[0237] N extj t he mRNAs 3'UTRs is parsed into all the segments, 
with the same length as the potential binding site end 
segments, preferably 9 nt segments, comprised in the 
3'UTR. Location of each such segment is noted, stored in a 
performance-efficient data structure and compared to the 
potential binding site end segments calculated in the pre- 
vious step. 

[0238] The target gene binding site detector 118 then expands 

the binding site sequence, preferably in the binding site 5' 



direction (i.e. immediately upstream), assessing the de- 
gree of its alignment to the Dicer-cut sequence from hair- 
pin structures 140. Preferably, an alignment algorithm is 
implemented which uses specific weighting parameters 
based on an analysis of known miRNA oligonucleotide 
binding sites. As an example, it is apparent that a good 
match of the 3' end of the binding site is critically impor- 
tant, a match of the 5' end is less important but can com- 
pensate for a small number of mismatches at the 3' end of 
the binding site, and a match of the middle portion of the 
binding site is much less important. 
[0239] Next, the number of binding sites found in a specific 

3'UTR, the degree of alignment of each of these binding 
sites, and their proximity to each other are assessed and 
compared to these properties found in known binding 
sites of published miRNA oligonucleotides. In a preferred 
embodiment, the fact that many of the known binding 
sites are clustered is used to evaluate the P value of ob- 
taining a cluster of a few binding sites on the same target 
gene 3'UTR in the following way. It scans different score 
thresholds and calculates for each threshold the number 
and positions of possible binding sites with a score above 
the threshold. It then gets a P value for each threshold 



from a preprocessed calculated background matrix, de- 
scribed hereinbelow, and a number and positions of bind- 
ing sites combination. The output score for each Dicer- 
cut sequences from hairpin structures 140 and potential 
target gene sequences 142 is the minimal P value, nor- 
malized with the number of threshold trails using a 
Bernoulli distribution. A preference of low P value pairs is 
made. 

[0240] As mentioned hereinabove, for each target gene, a pre- 
processed calculated background matrix is built. The ma- 
trix includes rows for each number of miRNA oligonu- 
cleotide binding sites (in the preferred embodiment, the 
matrix includes 7 rows to accommodate 0 to 6 binding 
sites), and columns for each different score threshold (in 
the preferred embodiment, the matrix includes 5 columns 
for 5 different thresholds). Each matrix cell, correspond- 
ing to a specific number of binding sites and thresholds, 
is set to be the probability of getting equal or higher 
number binding sites and an equal or higher score using 
random 22 nt-long sequences with the same nucleotide 
distribution as known miRNA oligonucleotides (29.5% T, 
24.5% A, 25% G and 21% C). Those probabilities are calcu- 
lated by running the above procedure for 10000 random 



sequences that preserved the known miRNA nucleotide 
distribution (these sequence will be also referred to as 
miRNA oligonucleotide random sequences). The P value 
can be estimated as the number of random sequences 
that obeys the matrix cell requirement divided by the total 
number of random sequences (10000). In the preferred 
embodiment, 2 matrices are calculated. The P values of 
the second matrix are calculated under a constraint that at 
least two of the binding site positions are under a heuris- 
tically-determined constant value. The values of the sec- 
ond matrix are calculated without this constraint. The tar- 
get gene binding site detector 118 uses the second matrix 
if the binding site positions agree with the constraint. 
Otherwise, it uses the first. In an alternative embodiment, 
only one matrix is calculated without any constraint on 
the binding sites positions. 
[0241] a test performed using the target gene binding site de- 
tector 118 shows that all of the known miRNA oligonu- 
cleotide target genes are found using this algorithm with 
a P value of less than 0.5%. Running known miRNA 
oligonucleotides against 3400 potential 3'UTR of target 
gene sequences yields on average 32 target genes for 
each miRNA oligonucleotide with a P value less than 0.5%, 



while background sequences, as well as inverse or com- 
plement sequence of known miRNA oligonucleotide (which 
preserve their high order sequence statistics) found, as 
expected, 17 target genes on average. This result reflects 
that the algorithm has the ability to detect real target 
genes with 47% accuracy. 

[0242] Finally, orthology data may optionally be used to further 
prefer binding sites based on their conservation. Prefer- 
ably, this may be used in cases such as (a) where both the 
target mRNA and miRNA oligonucleotide have orthologues 
in another organism, e.g. Human-Mouse orthology, or (b) 
where a miRNA oligonucleotide (e.g. viral miRNA oligonu- 
cleotide) targets two mRNAs in orthologous organisms. In 
such cases, binding sites that are conserved are preferred. 

[0243] | n accordance with another preferred embodiment of the 
present invention, binding sites may be searched by a re- 
verse process. Sequences of K (preferably 22) nucleotides 
in a UTR of a target gene are assessed as potential bind- 
ing sites. A sequence comparison algorithm, such as 
BLAST or EDIT DISTANCE variant, is then used to search 
elsewhere in the genome for partially or fully complemen- 
tary sequences that are found in known miRNA oligonu- 
cleotides or computationally-predicted GAM oligonu- 



cleotides. Only complementary sequences that meet pre- 
determined spatial structure and free-energy criteria as 
described hereinabove, are accepted. Clustered binding 
sites are strongly preferred and potential binding sites 
and potential GAM oligonucleotides that occur in evolu- 
tionarily-conserved genomic sequences are also pre- 
ferred. Scoring of candidate binding sites takes into ac- 
count free-energy and spatial structure of the binding site 
complexes, as well as the aforesaid preferences. 

[0244] The 3'UTR of each bacterial gene is extracted from the 
500 nts that lay downstream to the gene-coding region. 
Care is taken that the extracted 3'UTR is not partly cov- 
ered by the predicted 5'UTR of the next gene-coding re- 
gion, considered 300 nts upstream. This method is ap- 
plied on known (not hypothetical) bacterial genes of com- 
pleted pathogenic eubacterial genomes taken from the 
updated NCBI ReLseq database on 17 Mar 2004. 

[0245] Reference is now made to Fig. 8, which is a simplified 

flowchart illustrating a preferred operation of the function 
& utility analyzer 120 described hereinabove with refer- 
ence to Fig. 2. The goal of the function & utility analyzer 
120 is to determine if a potential target gene is in fact a 
valid clinically useful target gene. Since a potential novel 



GAM oligonucleotide binding a binding site in the UTR of 
a target gene is understood to inhibit expression of that 
target gene, and if that target gene is shown to have a 
valid clinical utility, then in such a case it follows that the 
potential novel oligonucleotide itself also has a valid use- 
ful function which is the opposite of that of the target 
gene. 

[0246] The function & utility analyzer 120 preferably receives as 
input a plurality of potential novel target genes having 
binding site/s 144 (Fig. 7 A), generated by the target gene 
binding site detector 118 (Fig. 2). Each potential oligonu- 
cleotide is evaluated as follows: First, the system checks 
to see if the function of the potential target gene is scien- 
tifically well established. Preferably, this can be achieved 
bioinformatically by searching various published data 
sources presenting information on known function of pro- 
teins. Many such data sources exist and are published, as 
is well known in the art. Next, for those target genes the 
function of which is scientifically known and is well docu- 
mented, the system then checks if scientific research data 
exists which links them to known diseases. For example, a 
preferred embodiment of the present invention utilizes 
the OMIM(TM) (Hamosh et al, 2002) database published by 



NCBI, which summarizes research publications relating to 
genes which have been shown to be associated with dis- 
eases. Finally, the specific possible utility of the target 
gene is evaluated. While this process too may be facili- 
tated by bioinformatic means, it might require manual 
evaluation of published scientific research regarding the 
target gene, in order to determine the utility of the target 
gene to the diagnosis and or treatment of specific disease. 
Only potential novel oligonucleotides, the target genes of 
which have passed all three examinations, are accepted as 
novel oligonucleotide. 

[0247] Reference is now made to Fig. 9, which is a simplified dia- 
gram describing each of a plurality of novel bioinformati- 
cally-detected regulatory polynucleotide referred to in this 
Table as the Genomic Record (CR) polynucleotide. GR en- 
codes an operon-like cluster of novel miRNA-like 
oligonucleotides, each of which in turn modulates expres- 
sion of at least one target gene. The function and utility of 
at least one target gene is known in the art. 

[0248] The GR PRECURSOR is a novel, bioinformatically-detected, 
regulatory, non-protein-coding polynucleotide. The 
method by which the GR PRECURSOR is detected is de- 
scribed hereinabove with additional reference to Figs. 1-9. 



[0249] GR PRECURSOR is preferably encoded by the bacterial 

genome and contains a cluster of novel bacterial oligonu- 
cleotides, which preferably bind to human target genes or 
to bacterium genes. Alternatively or additionally, GR PRE- 
CURSOR is encoded by the human genome and contains a 
cluster of novel human oligonucleotides, which preferably 
bind to bacterial target genes or to human genes. 

[0250] T he GR PRECURSOR encodes GR PRECURSOR RNA that is 
typically several hundred to several thousand nts long. 
The GR PRECURSOR RNA folds spatially, forming the GR 
FOLDED PRECURSOR RNA. It is appreciated that the GR 
FOLDED PRECURSOR RNA comprises a plurality of what is 
known in the art as hairpin structures. Hairpin structures 
result from the presence of segments of the nucleotide 
sequence of GR PRECURSOR RNA in which the first half of 
each such segment has a nucleotide sequence which is at 
least a partial, and sometimes an accurate, reverse- 
complement sequence of the second half thereof, as is 
well known in the art. 

[0251] The GR FOLDED PRECURSOR RNA is naturally processed by 
cellular enzymatic activity into a plurality of separate GAM 
precursor RNAs herein schematically represented by 
GAM1 FOLDED PRECURSOR RNA through GAM 3 FOLDED 



PRECURSOR RNA. Each GAM folded precursor RNA is a 
hairpin-shaped RNA segment, corresponding to GAM 
FOLDED PRECURSOR RNA of Fig. 1. 

[0252] The abovementioned GAM folded precursor RNAs are 
diced by DICER COMPLEX of Fig. 1, yielding short RNA 
segments of about 22 nts in length schematically repre- 
sented by GAM1 RNA through GAM 3 RNA. Each GAM RNA 
corresponds to GAM RNA of Fig. 1. GAM1 RNA, GAM2 RNA 
and GAM3 RNA each bind complementarily to binding 
sites located in the untranslated regions of their respec- 
tive target genes, designated GAM1 TARGET RNA, GAM2 
TARGET RNA and GAM 3 TARGET RNA, respectively. These 
target binding sites correspond to BINDING SITE I, BIND- 
ING SITE II and BINDING SITE III of Fig. 1. The binding of 
each GAM RNA to its target RNA inhibits the translation of 
its respective target proteins, designated GAM1 TARGET 
PROTEIN, GAM 2 TARGET PROTEIN and GAM3 TARGET 
PROTEIN, respectively. 

[0253] it is appreciated that the specific functions, and accord- 
ingly the utilities, of the GR polynucleotide are correlated 
with and may be deduced from the identity of the target 
genes that are inhibited by GAM RNAs that are present in 
the operon-like cluster of the polynucleotide. Thus, for 



the GR polynucleotide, schematically represented by 
GAM1 TARGET PROTEIN through GAM 3 TARGET PROTEIN 
that are inhibited by the GAM RNA. The function of these 
target genes is elaborated in Table 8, hereby incorporated 
herein. 

[0254] Reference is now made to Fig. 10, which is a block dia- 
gram illustrating different utilities of oligonucleotide of 
the novel group of oligonucleotides of the present inven- 
tion referred to here as GAM oligonucleotides and GR 
polynucleotides. The present invention discloses a first 
plurality of novel oligonucleotides referred to here as GAM 
oligonucleotides and a second plurality of operon-like 
polynucleotides referred to here as GR polynucleotides, 
each of the GR polynucleotide encoding a plurality of GAM 
oligonucleotides. The present invention further discloses a 
very large number of known target genes, which are 
bound by, and the expression of which is modulated by 
each of the novel oligonucleotides of the present inven- 
tion. Published scientific data referenced by the present 
invention provides specific, substantial, and credible evi- 
dence that the abovementioned target genes modulated 
by novel oligonucleotides of the present invention, are as- 
sociated with various diseases. Specific novel oligonu- 



cleotides of the present invention, target genes thereof 
and diseases associated therewith, are described herein- 
below with reference to Tables 1 through 12. It is there- 
fore appreciated that a function of GAM oligonucleotides 
and GR polynucleotides of the present invention is modu- 
lation of expression of target genes related to known bac- 
terial diseases, and that therefore utilities of novel 
oligonucleotides of the present invention include diagno- 
sis and treatment of the abovementioned diseases. 
[0255] pig. 10 describes various types of diagnostic and thera- 
peutic utilities of novel oligonucleotides of the present in- 
vention. A utility of novel oligonucleotide of the present 
invention is detection of GAM oligonucleotides and of GR 
polynucleotides. It is appreciated that since GAM oligonu- 
cleotides and GR polynucleotides modulate expression of 
disease related target genes, that detection of expression 
of GAM oligonucleotides in clinical scenarios associated 
with said bacterial diseases is a specific, substantial and 
credible utility. Diagnosis of novel oligonucleotides of the 
present invention may preferably be implemented by RNA 
expression detection techniques, including but not limited 
to biochips, as is well known in the art. Diagnosis of ex- 
pression of oligonucleotides of the present invention may 



be useful for research purposes, in order to further un- 
derstand the connection between the novel oligonu- 
cleotides of the present invention and the abovemen- 
tioned related bacterial diseases, for disease diagnosis 
and prevention purposes, and for monitoring disease 
progress. 

[0256] Another utility of novel oligonucleotides of the present in- 
vention is anti-GAM therapy, a mode of therapy which al- 
lows up regulation of a bacterial disease-related target 
gene of a novel GAM oligonucleotide of the present inven- 
tion, by lowering levels of the novel GAM oligonucleotide 
which naturally inhibits expression of that target gene. 
This mode of therapy is particularly useful with respect to 
target genes which have been shown to be under-ex- 
pressed in association with a specific bacterial disease. 
Anti-GAM therapy is further discussed hereinbelow with 
reference to Figs. 11A and 11B. 

[0257] a further utility of novel oligonucleotides of the present 
invention is GAM replacement therapy, a mode of therapy 
which achieves down regulation of a bacterial disease re- 
lated target gene of a novel GAM oligonucleotide of the 
present invention, by raising levels of the GAM which nat- 
urally inhibits expression of that target gene. This mode 



of therapy is particularly useful with respect to target 
genes which have been shown to be over-expressed in 
association with a specific bacterial disease. GAM replace- 
ment therapy involves introduction of supplementary GAM 
products into a cell, or stimulation of a cell to produce 
excess GAM products. GAM replacement therapy may 
preferably be achieved by transfecting cells with an artifi- 
cial DNA molecule encoding a GAM which causes the cells 
to produce the GAM product, as is well known in the art. 
[0258] yet a further utility of novel oligonucleotides of the 

present invention is modified GAM therapy. Disease con- 
ditions are likely to exist, in which a mutation in a binding 
site of a GAM RNA prevents natural GAM RNA to effec- 
tively bind inhibit a bacterial disease related target gene, 
causing up regulation of that target gene, and thereby 
contributing to the disease pathology. In such conditions, 
a modified GAM oligonucleotides is designed which effec- 
tively binds the mutated GAM binding site, i.e. is an effec- 
tive anti-sense of the mutated GAM binding site, and is 
introduced in disease effected cells. Modified GAM ther- 
apy is preferably achieved by transfecting cells with an ar- 
tificial DNA molecule encoding the modified GAM which 
causes the cells to produce the modified GAM product, as 



is well known in the art. 
[0259] Reference is now made to Figs. 11A and 11B, which are 
simplified diagrams which when taken together illustrate 
anti-GAM therapy mentioned hereinabove with reference 
to Fig. 10. A utility of novel GAMs of the present invention 
is anti-GAM therapy, a mode of therapy which allows up 
regulation of a bacterial disease-related target gene of a 
novel GAM of the present invention, by lowering levels of 
the novel GAM which naturally inhibits expression of that 
target gene. Fig. 11A shows a normal GAM inhibiting 
translation of a target gene by binding of GAM RNA to a 
BINDING SITE found in an untranslated region of GAM 
TARGET RNA, as described hereinabove with reference to 
Fig. 1. 

[0260] pig. 11B shows an example of anti-GAM therapy. ANTI- 
GAM RNA is short artificial RNA molecule the sequence of 
which is an anti-sense of GAM RNA. Anti-GAM treatment 
comprises transfecting diseased cells with ANTI-GAM 
RNA, or with a DNA encoding thereof. The ANTI-GAM RNA 
binds the natural GAM RNA, thereby preventing binding of 
natural GAM RNA to its BINDING SITE. This prevents natu- 
ral translation inhibition of GAM TARGET RNA by GAM 
RNA, thereby up regulating expression of GAM TARGET 



PROTEIN. 

[0261] it is appreciated that anti-GAM therapy is particularly use- 
ful with respect to target genes which have been shown to 
be under-expressed in association with a specific bacte- 
rial disease. 

[0262] Furthermore, anti-GAM therapy is particularly useful, 

since it may be used in situations in which technologies 
known in the art as RNAi and siRNA can not be utilized. As 
in known in the art, RNAi and siRNA are technologies 
which offer means for artificially inhibiting expression of a 
target protein, by artificially designed short RNA segments 
which bind complementarily to mRNA of said target pro- 
tein. However, RNAi and siRNA can not be used to directly 
up regulate translation of target proteins. 

[0263] Reference is now made to Fig. 12A, which is a bar graph 
illustrating performance results of the hairpin detector 
114 (Fig. 2) constructed and operative in accordance with 
a preferred embodiment of the present invention. 

[0264] Fig. 12 A illustrates efficacy of several features used by the 
hairpin detector 114 to detect GAM FOLDED PRECURSOR 
RNAs (Fig. 1). The values of each of these features is com- 
pared between a set of published miRNA precursor 
oligonucleotides, represented by shaded bars, and a set of 



random hairpins folded from the human genome denoted 
hereinbelow as a hairpin background set, represented by 
white bars. The published miRNA precursor oligonu- 
cleotides set is taken from RFAM database, Release 2.1 
and includes 148 miRNA oligonucleotides from H. Sapiens. 
The background set comprises a set of 10,000 hairpins 
folded from the human genome. 
[0265] it is appreciated that the hairpin background set is ex- 
pected to comprise some valid, previously undetected 
hairpin-shaped miRNA precursor-like GAM FOLDED PRE- 
CURSOR RNAs of the present invention, and many hairpin- 
shaped sequences that are not hairpin-shaped miRNA- 
like precursors. 

[0266] For each feature, the bars depict the percent of known 
miRNA hairpin precursors (shaded bars) and the percent 
of background hairpins (white bars) that pass the thresh- 
old for that feature. The percent of known miRNA 
oligonucleotides that pass the threshold indicates the 
sensitivity of the feature, while the corresponding back- 
ground percent implies the specificity of the feature, al- 
though not precisely, because the background set com- 
prises both true and false examples. 

[0267] The first bar pair, labeled Thermodynamic Stability Selec- 



tion, depicts hairpins that have passed the selection of 
"families" of closely related hairpin structures, as de- 
scribed hereinabove with reference to Fig. 5B. 

[0268] The second bar pair, labeled Hairpin Score, depicts hair- 
pins that have been selected by hairpin detector 114 (Fig. 
5B), regardless of the "families" selection. 

[0269] The third bar pair, labeled Conserved, depicts hairpins 

that are conserved in human, mouse and rat, (UCSC Gold- 
enpath (TM) HG16 database). 

[0270] The fourth bar pair, labeled Expressed, depicts hairpins 
that are found in EST blocks. 

[0271] The fifth bar pair, labeled Integrated Selection, depicts 

hairpin structures predicted by a preferred embodiment of 
the present invention to be valid CAM PRECURSORS. In a 
preferred embodiment of the present invention, a hairpin 
may be considered to be a GAM PRECURSOR if its hairpin 
detector score is above 0, and it is in one of the following 
groups: a) in an intron and conserved or b) in an inter- 
genic region and conserved or c) in an intergenic region 
and expressed, as described below. Further filtering of 
GAM precursor may be obtained by selecting hairpins with 
a high score of Dicer-cut location detector 116 as de- 
scribed hereinabove with reference to Figs. 6A-6C, and 



with predicted miRNA oligonucleotides, which pass the 
low complexity filter as described hereinabove, and whose 
targets are selected by the target gene binding site detec- 
tor 118 as described hereinabove with reference to Figs. 
7A-7B. 

[° 272 ] It is appreciated that these results validate the sensitivity 
and specificity of the hairpin detector 114 (Fig. 2) in iden- 
tifying novel GAM FOLDED PRECURSOR RNAs, and in ef- 
fectively distinguishing them from the abundant hairpins 
found in the genome. 

[0273] Reference is now made to Fig. 12B, which is a line graph 
illustrating accuracy of a Dicer-cut location detector 116 
(Fig. 2) constructed and operative in accordance with a 
preferred embodiment of the present invention. 

[0274] jo determine the accuracy of the Dicer-cut location de- 
tector 116, a stringent training and test set was chosen 
from the abovementioned set of 440 known miRNA 
oligonucleotides, such that no two miRNA oligonu- 
cleotides in the set are homologous. This was performed 
to get a lower bound on the accuracy and avoid effects of 
similar known miRNA oligonucleotides appearing in both 
the training and test sets. On this stringent set of size 
204, mfold cross validation with k=3 was performed to 



determine the percent of known miRNA oligonucleotides 
in which the Dicer-cut location detector 116 described 
hereinabove predicted the correct miRNA oligonucleotide 
up to two nucleotides from the correct location. The accu- 
racy of the TWO PHASED predictor is depicted in the 
graph. The accuracy of the first phase of the TWO PHASED 
predictor is depicted by the upper line, and that of both 
phases of the TWO PHASED predictor is depicted by the 
lower line. Both are binned by the predictor score, where 
the score is the score of the first stage. 

[° 275 ] It is appreciated that these results validate the accuracy of 
the Dicer-cut location detector 116. 

[0276] Reference is now made to Fig. 12C, which is a bar graph 
illustrating the performance results of the target gene 
binding site detector 118 (Fig. 7A) constructed and opera- 
tive in accordance with a preferred embodiment of the 
present invention. 

[0277] pig. 12C illustrates specificity and sensitivity of the target 
gene binding site detector 118. The values presented are 
the result of testing 10000 artificial miRNA oligonu- 
cleotide sequences (random 22 nt sequences with the 
same base composition as published miRNA oligonu- 
cleotide sequence). Adjusting the threshold parameters to 



fulfill 90% sensitivity of validated, published miRNA-3'UTR 
pairs, requires the P VAL of potential target gene se- 
quences-Dicer-cut sequences to be less than 0.01 and 
also the P VAL of potential target ortholog gene se- 
quences-Dicer-cut sequences to be less than 0.05. The 
target gene binding site detector 118 can filter out 99.7% 
of potential miRNA/gene pairs, leaving only the 0.3% that 
contain the most promising potential miRNA/gene pairs. 
Limiting the condition for the P VAL of potential target or- 
tholog gene sequences-Dicer-cut sequences to be less 
than 0.01 reduces the sensitivity ratio to 70% but filters 
out more then 50% of the remaining 0.3%, to a final ratio 
of less than 0.15%. 
[0278] it is appreciated that these results validate the sensitivity 
and specificity of the target gene binding site detector 
118. 

[0279] Reference is now made to Fig. 13, which is a summary ta- 
ble of laboratory results validating the expression of 29 
novel human GAM RNA oligonucleotides in HeLa cells or, 
alternatively, in liver or thymus tissues detected by the 
bioinformatic oligonucleotide detection engine 100 (Fig. 
2). 

[0280] As a positive control, we used a reference set of eight 



known human miRNA oligonucleotides: hsa-MIR-21; hsa- 
MIR-27b; hsa-MIR-186; hsa-MIR-93; hsa-MIR-26a; hsa- 
MIR-191; hsa-MIR-31; and hsa-MIR-92. All positive con- 
trols were successfully validated by sequencing. 

[0281] The table of Fig. 13 lists all GAM RNA predictions whose 
expression was validated. The field "Primer Sequence" 
contains the "specific" part of the primer; the field "Se- 
quenced sequence" represents the nucleotide sequence 
detected by cloning (excluding the hemispecific primer 
sequence); the field "Predicted GAM RNA" contains the 
GAM RNA predicted sequence; the field "Distance indicate 
the distance from Primer; the number of mismatches be- 
tween the "specific" region of the primer and the corre- 
sponding part of the GAM RNA sequence; the field "GAM 
Name" contains GAM RNA PRECURSOR ID followed by "A" 
or "B", which represents the GAM RNA position on the 
precursor as elaborated in the attached Tables. 

[0282] a primer was designed such that its first half, the 5' re- 
gion, is complementary to the adaptor sequence and its 
second half, the 3' region, anneals to the 5' terminus of 
GAM RNA sequence, yielding a hemispecific primer (as 
elaborated hereinbelow in the Methods section). A sample 
of 13 predicted GAM RNA sequences was examined by 



PCR using hemispecific primers and a primer specific to 
the 3' adaptor. PCR products were cloned into plasmid 
vectors and then sequenced. For all 13 predicted GAM 
RNA sequences, the GAM RNA sequence found in the 
hemispecific primer plus the sequence observed between 
the hemispecific primer and the 3' adaptor was completely 
included in the expected GAM RNA sequence (rows 1-7, 
and 29). The rest are GAM RNA predictions that were veri- 
fied by cloning and sequencing, yet, by using a primer 
that was originally designed for a slightly different predic- 
tion. 

[0283] it is appreciated that failure to detect a predicted oligonu- 
cleotide in the lab does not necessarily indicate a mis- 
taken bioinformatic prediction. Rather, it may be due to 
technical sensitivity limitation of the lab test, or because 
the predicted oligonucleotides are not expressed in the 
tissue examined, or at the development phase tested. The 
observed GAM RNAs may be strongly expressed in HeLa 
cells while the original GAM RNAs are expressed at low 
levels in HeLa cells or not expressed at all. Under such 
circumstances, primer sequences containing up to three 
mismatches from a specific GAM RNA sequence may am- 
plify it. Thus, we also considered cases in which differ- 



ences of up to 3 mismatches in the hemispecific primer 
occur. 

[0284] The 3' terminus of observed GAM RNA sequences is often 
truncated or extended by one or two nucleotides. Cloned 
sequences that were sequenced from both 5' and 3' ter- 
mini have an asterick appended to the row number. 

[0285] interestingly, the primer sequence followed by the ob- 
served cloned sequence is contained within five GAM RNA 
sequences of different lengths, and belong to 24 precur- 
sors derived from distinct loci (Row 29). Out of these, one 
precursor appears four times in the genome and its corre- 
sponding GAM Names are 351973-A, 352169-A, 
352445-A and 358164-A. 

[0286] jhe sequence presented in Row 29 is a representative of 
the group of five GAM RNAs. The full list of GAM RNA se- 
quences and their corresponding precursors is as follows 
(each GAM RNA sequence is followed by the GAM Name): 
TCACTGCAACCTCCACCTCCCA (352092, 
352651,35576 1) ,TCACTGCAACCTCCACCTCCCG (351868, 
352440, 351973, 352169, 352445, 358164, 353737, 
352382, 352235, 352232, 352268, 351919, 352473, 
352444, 353638, 353004, 352925, 352943), TCACTG- 
CAACCTCCACCTCCTG 



(358311),TCACTGCAACCTCCACCTTCAG (353323), and 
TCACTGCAACCTCCACCTTCCG (353856). 
[0287] METHOD SECTION 

[0288] CELL LINES 

[0289] Three common human cell lines, obtained from Dr. Yonat 
Shemer at Soroka Medical Center, Be'er Sheva, Israel, were 
used for RNA extraction; Human Embryonic Kidney HEK- 
293 cells, Human Cervix Adenocarcinoma HeLa cells and 
Human Prostate Carcinoma PC3cells. 

[0290] RNA PURIFICATION 

[0291] Several sources of RNA were used to prepare libraries: 

[0292] Tota | HeLa S100 RNA was prepared from HeLa S100 cellu- 
lar fraction (4C Biotech, Belgium) through an SDS 
(l%)-Proteinase K (200g/ml) 30 minute incubation at 37 C 
followed by an acid Phenol-Chloroform purification and 
isopropanol precipitation (Sambrook et al; Molecular 
Cloning- A Laboratory Manual). 

[0293] Total HeLa, HEK-293 and PC 3 cell RNA was prepared us- 
ing the standard Tri-Reagent protocol (Sigma) according 
to the manufacturer's instructions, except that 1 volume 
of isopropanol was substituted with 3 volumes of ethanol. 

[0294] Nuclear and Cytoplasmic RNA was prepared from HeLa or 



HEK-293 cells in the following manner: 

[0295] c e || were washed and harvested in ice-cold PBS and pre- 
cipitated in a swing-out rotor at 1200 rpm at 4 C for 5 
minutes. Pellets were loosened by gentle vortexing. 4ml of 
"NP40 lysis buffer" (lOmM TrisHCI, 5mM MgCI2, lOmM 
NaCI, 0.5% Nonidet P40 , ImM Spermidine, ImM DTT, 
140U/ml rRnasine ) was then added per 5*107 cells. Cells 
and lysis buffer were incubated for 5 minutes on ice and 
centrifuged in a swing-out rotor at 500xg at 4 C for 5 
minutes. Supernatant, termed cytoplasm, is carefully re- 
moved to a tube containing SDS (1% final) and proteinase- 
K (200 g/ml final). Pellet, termed nuclear fraction, is re- 
washed and incubated with a similar amount of fresh lysis 
buffer. Lysis is monitored visually under a microscope at 
this stage, typically for 5 minutes. Nuclei are pelleted in a 
swing-out rotor at 500xg at 4 C for 5 minutes. Super- 
natant is pooled, incubated at 37 C for 30 minutes, Phe- 
nol/Chloroform-extracted, and RNA is alcohol-pre- 
cipitated (Sambrook et al). Nuclei are loosened and then 
homogenized immediately in >10 volumes of Tri-Reagent 
(Sigma). Nuclear RNA is then prepared according to the 
manufacturer's instructions. 

[0296] TOTAL TISSUE RNA 



[0297] Total tissue RNA was obtained from Ambion USA, and in- 
cluded Human Liver, Thymus, Placenta, Testes and Brain. 
[0298] RNA SIZE FRACTIONATION 

[0299] rna used for libraries was always size-fractionated. Frac- 
tionation was done by loading up to 500 microgram RNA 
per YM100 Amicon Microcon column (Millipore) followed 
by a 500xg centrifugation for 40 minutes at 4 C. Flow- 
through "YM100" RNA is about one quarter of the total 
RNA and was used for library preparation or fractionated 
further by loading onto a YM30 Amicon Microcon column 
(Millipore) followed by a 13,500xg centrifugation for 25 
minutes at 4 C. Flow-through "YM30" was used for library 
preparation "as is" and consists of less than 0.5% of total 
RNA. Additional size fractionation was achieved during li- 
brary preparation. 

[0300] LIBRARY PREPARATION 

[0301] t wo types of cDNA libraries, designated "One-tailed" and 
"Ligation", were prepared from the one of the abovemen- 
tioned fractionated RNA samples. RNA was dephosphory- 
lated and ligated to an RNA (designated with lowercase 
letters)-DNA (designated with UPPERCASE letters) hybrid 
5'-phosphorylated, 3' idT blocked 3'-adapter 



(S'-P-uuuAACCGCATCCTTCTC-idT-S' Dharmacon # P- 

002045- 01-05) (as elaborated in Elbashir et al., Genes 
Dev. 15:188-200 (2001)) resulting in ligation only of 
RNase III type cleavage products. 3-Ligated RNA was ex- 
cised and purified from a half 6%, half 13% polyacrylamide 
gel to remove excess adapter with a Nanosep 0.2 microM 
centrifugal device (Pall) according to instructions, and 
precipitated with glycogen and 3 volumes of ethanol. Pel- 
let was resuspended in a minimal volume of water. 

[0302] F or the "Ligation" library, a DNA (UPPERCASE)-RNA 
(lowercase) hybrid 5-adapter 
( 5 1 -TACTAATACG ACTCACTaaa- 3 1 Dharmacon # P- 

002046- 01-05) was ligated to the 3-adapted RNA, re- 
verse transcribed with "EcoRI-RT": 

( 5 1 - G ACT AG CTG G AATTC AAG G ATG C G GTTAAA- 3 ') , PCR- 
amplified with two external primers essentially as in El- 
bashir et al. (2001), except that primers were "EcoRI-RT" 
and "Pstl 

Fwd " ( 5 1 - C AG CC AACG CTG C AG ATAC G ACTCACTAAA- 3 ') . 
This PCR product was used as a template for a second 
round of PCR with one hemispecific and one external 
primer or with two hemispecific primers. 
[0303] For the "One-tailed" library, the 3-adapted RNA was an- 



nealed to 20pmol primer "EcoRI RT" by heating to 70 C 
and cooling 0.1 C/sec to 30 C and then reverse-tran- 
scribed with Superscript II RT (according to manufacturer's 
instructions, Invitrogen) in a 20 microliters volume for 10 
alternating 5 minute cycles of 37 C and 45 C. Subse- 
quently, RNA was digested with 1 microliter 2M NaOH and 
2mM EDTA at 65 C for 10 minutes. cDNA was loaded on a 
polyacrylamide gel, excised and gel-purified from excess 
primer as above (invisible, judged by primer run along- 
side) and resuspended in 13 microliters of water. Purified 
cDNA was then oligo-dC tailed with 400U of recombinant 
terminal transferase (Roche Molecular Biochemicals), 1 
microliter 100 microM dCTP, 1 microliter 15mM CoCI2, 
and 4 microliters reaction buffer, to a final volume of 20 
microliters for 15 minutes at 37 C. Reaction was stopped 
with 2 microliters 0.2M EDTA and 15 microliters 3M 
NaOAc pH 5.2. Volume was adjusted to 150 microliters 
with water, Phenol: Bromochloropropane 10:1 extracted 
and subsequently precipitated with glycogen and 3 vol- 
umes of ethanol. C-tailed cDNA was used as a template 
for PCR with the external primers 

,, T3-PstBsg(G/l)18 ,, (5 , -AATTAACCCTCACTAAAGGCTGCAG 
GTGCAGGIGGGIIGGGIIGGGIIGN-3 , where I stands for Ino- 



sine and N for any of the 4 possible deoxynucleotides), 
and with "EcoRI 

Nested'^S'-GGAATTCAAGGATGCGGTrA-S 1 ). This PCR 
product was used as a template for a second round of PCR 
with one hemispecific and one external primer or with two 
hemispecific primers. 
[0304] PRIMER DESIGN AND PCR 

[0305] Hemispecific primers were constructed for each predicted 
GAM RNA oligonucleotide by an in-house program de- 
signed to choose about half of the 5' or 3' sequence of the 
GAM RNA corresponding to a TM of about 30 -34 C con- 
strained by an optimized 3' clamp, appended to the 
cloning adapter sequence (for "One-tailed" libraries, 
5'-GGNNGGGNNG on the 5' end or TTTAACCGCATC-3 1 on 
the 3' end of the GAM RNA; for "Ligation" libraries, the 
same 3' adapter and 5'-CGACTCACTAAA on the 5' end of 
the GAM RNA). Consequently, a fully complementary 
primer of a TM higher than 60 C was created covering 
only one half of the GAM RNA sequence permitting the 
unbiased elucidation by sequencing of the other half. 

[0306] For each primer, the following criteria were used: Primers 
were graded according to the TM of the primer half and 
the nucleotide content of 3 nucleotides of the 3' clamp 



from worst to best, roughly: GGG-3' <CCC-3' 
<TTT-37AAA-3' <GG-3' <CC-3' <a TM lower than 30 < 
a TM higher than 34 <TT-37AA-3' <3G/C nucleotide 
combination <3 A/T nucleotide combination <any combi- 
nation of two/three different nucleotides <any combina- 
tion of three/three different nucleotides. 
[0307] VALIDATION PCR PRODUCT BY SOUTHERN BLOT 

[0308] GAM RNA oligonucleotides were validated by hybridization 
of Polymerase Chain Reaction (PCR)-product Southern 
blots with a probe to the predicted GAM RNA. 

[0309] pgr product sequences were confirmed by Southern blot 
(Southern E.M., Biotechnology 1992,24:122-139 (1975)) 
and hybridization with DNA oligonucleotide probes syn- 
thesized as complementary (antisense) to predicted GAM 
RNA oligonucleotides. Gels were transferred onto a Bio- 
dyne PLUS 0.45m (Pall) positively charged nylon mem- 
brane and UV cross-linked. Hybridization was performed 
overnight with DIG-labeled probes at 42 C in DIG Easy- 
Hyb buffer (Roche). Membranes were washed twice with 
2xSSC and 0.1% SDS for 10 minutes at 42 C and then 
washed twice with 0.5xSSC and 0.1% SDS for 5 min at 42 
C. The membrane was then developed by using a DIG lu- 
minescent detection kit (Roche) using anti-DIG and CSPD 



reaction, according to the manufacturer's protocol. All 
probes were prepared according to the manufacturer's 
(Roche Molecular Biochemicals) protocols: Digoxigenin 
(DIG) labeled antisense transcripts were prepared from 
purified PCR products using a DIG RNA labeling kit with 
T3 RNA polymerase. DIG-labeled PCR was prepared by us- 
ing a DIG PCR labeling kit. 3'-DIG-tailed oligo ssDNA anti- 
sense probes, containing DIG-dUTP and dATP at an aver- 
age tail length of 50 nts were prepared from lOOpmole 
oligonucleotides with the DIG Oligonucleotide Labeling 
Kit. Control reactions contained all of the components of 
the test reaction except library template. 
[0310] VALIDATION OF PCR PRODUCT BY NESTED PCR ON THE 
LIGATION 

[0311] jo further validate predicted GAM PCR product sequence 
derived from hemi-primers, a PCR-based diagnostic tech- 
nique was devised to amplify only those products contain- 
ing at least two additional nucleotides of the non hemi- 
primer defined part of the predicted GAM RNA oligonu- 
cleotide. In essence, a diagnostic primer was designed so 
that its 3' end, which is the specificity determining side, 
was identical to the desired GAM RNA oligonucleotide, 
2-10 nts (typically 4-7, chosen for maximum specificity) 



further into its 3' end than the nucleotide stretch primed 
by the hemi-primer. The hemi-primer PCR product was 
first ligated into a T-cloning vector (pTZ57/T or pGEM-T) 
as described hereinabove. The ligation reaction mixture 
was used as template for the diagnostic PCR under strict 
annealing conditions with the new diagnostic primer in 
conjunction with a general plasmid-homologous primer, 
resulting in a distinct -200 base-pair product. This PCR 
product can be directly sequenced, permitting the eluci- 
dation of the remaining nucleotides up to the 3' of the 
mature GAM RNA oligonucleotide adjacent to the 3' 
adapter. Alternatively, following analysis of the diagnostic 
PCR reaction on an agarose gel, positive ligation reactions 
(containing a band of the expected size) were transformed 
into E. coli. Using this same diagnostic technique and as 
an alternative to screening by Southern blot colony hy- 
bridization, transformed bacterial colonies were screened 
by colony-PCR (Gussow, D. and Clackson, T, Nucleic Acids 
Res. 17:4000 (1989)) with the nested primer and the vec- 
tor primer, prior to plasmid purification and sequencing. 
[0312] VALIDATION OF PCR PRODUCT BY CLONING AND SE- 
QUENCING 

[0313] pcr products were inserted into pGEM-T (Promega) or 



pTZ57/T (MBI Fermentas), heat-shock transformed into 
competent JM109 E. coli (Promega) and seeded on LB- 
Ampicilin plates with IPTG and Xgal. White and light blue 
colonies were transferred to duplicate gridded plates, one 
of which was blotted onto a membrane (Biodyne Plus, Pall) 
for hybridization with DIG tailed oligo probes (according 
to instructions, Roche) complementary to the expected 
GAM. Plasmid DNA from positive colonies was sequenced. 

[° 314 ] It is appreciated that the results summarize in Fig. 13 val- 
idate the efficacy of the bioinformatic oligonucleotide de- 
tection engine 100 of the present invention. 

[0315] Reference is now made to Fig. 14A, which is a schematic 
representation of a novel human GR polynucleotide, lo- 
cated on chromosome 9, comprising 2 known human 
miRNA oligonucleotides - MIR24 and MIR23, and 2 novel 
GAM oligonucleotides, herein designated GAM7617 and 
GAM252 (later discovered by other researchers as hsa- 
mir-27b), all marked by solid black boxes. Fig. 14A also 
schematically illustrates 6 non-GAM hairpin sequences, 
and one non-hairpin sequence, all marked by white 
boxes, and serving as negative controls. By "non-GAM 
hairpin sequences" is meant sequences of a similar length 
to known miRNA precursor sequences, which form hairpin 



secondary folding pattern similar to miRNA precursor 
hairpins, and yet which are assessed by the bioinformatic 
oligonucleotide detection engine 100 not to be valid GAM 
PRECURSOR hairpins. It is appreciated that Fig. 14A is a 
simplified schematic representation, reflecting only the 
order in which the segments of interest appear relative to 
one another, and not a proportional distance between the 
segments. 

[0316] Reference is now made to Fig. 14B, which is a schematic 
representation of secondary folding of each of the MIRs 
and GAMs of the GR MIR24, MIR23, GAM7617 and 
GAM252, and of the negative control non-GAM hairpins, 
herein designated N2, N3, N252, N4, N6 and N7. NO is a 
non-hairpin control, of a similar length to that of known 
miRNA precursor hairpins. It is appreciated that the nega- 
tive controls are situated adjacent to and in between real 
miRNA oligonucleotides and GAM predicted oligonu- 
cleotides and demonstrates similar secondary folding pat- 
terns to that of known MIRs and GAMs. 

[0317] Reference is now made to Fig. 14C, which is a picture of 
laboratory results of a PCR test upon a YM100 size- 
fractionated "ligation" library, utilizing a set of specific 
primer pairs located directly inside the boundaries of the 



hairpins. Due to the nature of the library the only PCR am- 
plifiable products can result from RNaselll type enzyme 
cleaved RNA, as expected for legitimate hairpin precursors 
presumed to be produced by DROSHA (Lee et al, Nature 
425 415-419, 2003). Fig. 14C demonstrates expression 
of hairpin precursors of known miRNA oligonucleotides 
hsa-mir23 and hsa-mir24, and of novel bioinformatically-de- 
tected GAM7617 and GAM252 hairpins predicted bioin- 
formatically by a system constructed and operative in ac- 
cordance with a preferred embodiment of the present in- 
vention. Fig. 14C also shows that none of the 7 controls (6 
hairpins designated N2, N3, N23, N4, N6 and N7 and 1 
non-hairpin sequence designated NO) were expressed. 
N252 is a negative control sequence partially overlapping 
GAM252. 

[0318] | n the picture, test lanes including template are desig- 
nated "+" and the control lane is designated The con- 
trol reaction contained all the components of the test re- 
action except library template. It is appreciated that for 
each of the tested hairpins, a clear PCR band appears in 
the test ("+") lane, but not in the control ("-") lane. 

[0319] pigs. 14A through 14C, when taken together validate the 
efficacy of the bioinformatic oligonucleotide detection en- 



gine in: (a) detecting known miRNA oligonucleotides; (b) 
detecting novel GAM PRECURSOR hairpins which are found 
adjacent to these miRNA oligonucleotides, and which de- 
spite exhaustive prior biological efforts and bioinformatic 
detection efforts, went undetected; (c) discerning between 
GAM (or MIR) PRECURSOR hairpins, and non-GAM hair- 
pins. 

[0320] it is appreciated that the ability to discern GAM-hairpins 
from non-GAM-hairpins is very significant in detecting 
GAM oligonucleotides since hairpins are highly abundant 
in the genome. Other miRNA prediction programs have 
not been able to address this challenge successfully. 

[0321] Reference is now made to Fig. 15A, which is an annotated 
sequence of an EST comprising a novel GAM oligonu- 
cleotides detected by the oligonucleotide detection sys- 
tem of the present invention. Fig. 15A shows the nu- 
cleotide sequence of a known human non-protein-coding 
EST (Expressed Sequence Tag), identified as EST72223. 
The EST72223 clone obtained from TIGR database 
(Kirkness and Kerlavage, 1997) was sequenced to yield the 
above 705bp transcript with a polyadenyl tail. It is appre- 
ciated that the sequence of this EST comprises sequences 
of one known miRNA oligonucleotide, identified as hsa- 



MIR98, and of one novel GAM oligonucleotide referred to 
here as GAM25, detected by the bioinformatic oligonu- 
cleotide detection engine 100 (Fig. 2) of the present in- 
vention. 

[0322] The sequences of the precursors of the known MIR98 and 
of the predicted GAM25 precursors are marked in bold, 
the sequences of the established miRNA 98 and of the 
predicted miRNA-like oligonucleotide GAM25 are under- 
lined. 

[0323] Reference is now made to Figs. 15B, 15C and 15D, which 
are pictures of laboratory results, which when taken to- 
gether demonstrate laboratory confirmation of expression 
of the bioinformatically-detected novel oligonucleotide of 
Fig. 15A. In two parallel experiments, an enzymatically 
synthesized capped, EST72223 RNA transcript, was incu- 
bated with Hela S100 lysate for 0 minutes, 4 hours and 24 
hours. RNA was subsequently harvested, run on a dena- 
turing polyacrylamide gel, and reacted with either a 102 
nt antisense MIR98 probe or a 145 nt antisenseGAM25 
precursor transcript probe respectively. The Northern blot 
results of these experiments demonstrated processing of 
EST72223 RNA by Hela lysate (lanes 2-4, in Figs. 15B and 
15C), into ~80bp and ~22bp segments, which reacted 



with the MIR98 precursor probe (Fig. 15B), and into 
~100bp and ~24bp segments, which reacted with the 
GAM25 precursor probe (Fig. 15C). These results demon- 
strate the processing of EST72223 by Hela lysate into 
MIR98 precursor and GAM25 precursor. It is also appreci- 
ated from Fig. 15C (lane 1) that Hela lysate itself reacted 
with the GAM25 precursor probe, in a number of bands, 
including a ~100bp band, indicating that 
GAM25-precursor is endogenously expressed in Hela 
cells. The presence of additional bands, higher than 
lOObp in lanes 5-9 probably corresponds to the presence 
of nucleotide sequences in Hela lysate, which contain the 
GAM25 sequence. 
[° 324 ] In addition, in order to demonstrate the kinetics and 

specificity of the processing of MIR98 and GAM25 precur- 
sors into their respective mature, "diced" segments, tran- 
scripts of MIR98 and of the bioinformatically predicted 
GAM25 precursors were similarly incubated with Hela 
S100 lysate, for 0 minutes, 30 minutes, 1 hour and 24 
hours, and for 24 hours with the addition of EDTA, added 
to inhibit Dicer activity, following which RNA was har- 
vested, run on a polyacrylamide gel and reacted with 
MIR98 and GAM25 precursor probes. Capped transcripts 



were prepared for in vitro RNA cleavage assays with T7 
RNA polymerase, including a mZGCS'JpppCS'JG-capping re- 
action using the T7-mMessage mMachine kit (Ambion). 
Purified PCR products were used as template for the reac- 
tion. These were amplified for each assay with specific 
primers containing a 17 promoter at the 5' end and a T3 
RNA polymerase promoter at the 3' end. Capped RNA 
transcripts were incubated at 30C in supplemented, dialy- 
sis concentrated, Hela S100 cytoplasmic extract (4C 
Biotech, Seneffe, Belgium). The Hela S100 was supple- 
mented by dialysis to a final concentration of 20mM 
Hepes, lOOmM KCI, 2.5mM MgCI2, 0.5mM DTT, 20% glyc- 
erol and protease inhibitor cocktail tablets (Complete mini 
Roche Molecular Biochemicals). After addition of all com- 
ponents, final concentrations were lOOmM capped target 
RNA, 2mM ATP, 0.2mM GTP, 500U/ml RNasin, 25 micro- 
gram/ml creatine kinase, 25mM creatine phosphate, 
2.5mM DTT and 50% S100 extract. Proteinase K, used to 
enhance Dicer activity (Zhang et al., EMBO J. 21, 
5875-5885 (2002)) was dissolved in 50mM Tris-HCI pH 8, 
5mM CaCI2, and 50% glycerol, was added to a final con- 
centration of 0.6 mg/ml. Cleavage reactions were stopped 
by the addition of 8 volumes of proteinase K buffer 



(200Mm Tris-Hcl, pH 7.5, 25m M EDTA, 300mM NaCI, and 
2% SDS) and incubated at 65C for 15min at different time 
points (0, 0.5, 1, 4, 24h) and subjected to phenol/ 
chloroform extraction. Pellets were dissolved in water and 
kept frozen. Samples were analyzed on a segmented half 
6%, half 13% polyacrylamide 1XTBE-7M Urea gel. 

[0325] The Northern blot results of these experiments demon- 
strated an accumulation of a ~22bp segment which re- 
acted with the MIR98 precursor probe, and of a ~24bp 
segment which reacted with the GAM25 precursor probe, 
over time (lanes 5-8). Absence of these segments when 
incubated with EDTA (lane 9), which is known to inhibit 
Dicer enzyme (Zhang et al., 2002), supports the notion 
that the processing of MIR98 and CAM25 precursors into 
their " diced" segments is mediated by Dicer enzyme, 
found in Hela lysate. Other RNases do not utilize divalent 
cations and are thus not inhibited by EDTA. The molecular 
sizes of EST72223, MIR-98 and CAM25 and their corre- 
sponding precursors are indicated by arrows. 

[0326] pig. 15D present Northern blot results of same above ex- 
periments with GAM25 probe (24 nt). The results clearly 
demonstrated the accumulation of mature GAM25 
oligonucleotide after 24 h. 



[0327] jo validate the identity of the band shown by the lower 
arrow in figs. 15C and 15D, a RNA band parallel to a 
marker of 24 base was excised from the gel and cloned as 
in Elbashir et al (2001) and sequenced. Ninety clones cor- 
responded to the sequence of mature GAM25 oligonu- 
cleotide, three corresponded to GAM25* (the opposite 
arm of the hairpin with a 1-3 nt 3' overhang) and two to 
the hairpin-loop. 

[0328] GAM25 was also validated endogenously by sequencing 
from both sides from a HeLa YM100 total-RNA "ligation" 
libraries, utilizing hemispecific primers as described in 
Fig. 13. 

[0329] Taken together, these results validate the presence and 
processing of a novel miRNA-like oligonucleotide, 
GAM25, which was predicted bioinformatically. The pro- 
cessing of this novel GAM oligonucleotide product, by 
Hela lysate from EST72223, through its precursor, to its 
final form was similar to that observed for known miRNA 
oligonucleotide, MIR98. 

[0330] Transcript products were 705 nt (EST72223), 102 nt 
(MIR98 precursor), 125 nt (GAM25 precursor) long. 
EST72223 was PCR-amplified with T7-EST 72223 forward 
primer: 



S'-TAATACGACTCACTATAGGCCCTTATTAGAGGATTCTGCT 
-3' and T3-EST72223 reverse 

primer:"-AATTAACCCTCACTAAAGGTT I I I I I ITCCTGAGA 
CAGAGT-3'.MIR98 was PCR-amplified using EST72223 as 
a template with T7MIR98 forward primer: 
S'-TAATACGACTCACTATAGGGTGAGGTAGTAAGTTGTATT 
GTT-3'and T3MIR98 reverse primer: 
5 '- AATTAACCCTCACTAAAGGG AAAGTAGTAAGTTGTATAG 
TT-3\ GAM25 was PCR-amplified using EST72223 as a 
template with GAM25 forward primer: 
S'-GAGGCAGGAGAATTGCTTGA-S 1 and T3-EST72223 re- 
verse 

p r i m e r : 5 1 - AATTAACCCTC ACTAAAG G CCTG AG AC AG AGTCT 
TGCTC-3'. 

[0331] it is appreciated that the data presented in Figs. 15A, 15B, 
15C and 15D when taken together validate the function of 
the bioinformatic oligonucleotide detection engine 100 of 
Fig. 2. Fig. 15A shows a novel GAM oligonucleotide bioin- 
formatically-detected by the bioinformatic oligonucleotide 
detection engine 100, and Figs. 15C and 15D show labo- 
ratory confirmation of the expression of this novel 
oligonucleotide. This is in accord with the engine training 
and validation methodology described hereinabove with 



reference to Fig. 2. 

[0332] Reference is now made to Figs. 16A-C, which schemati- 
cally represent three methods that are employed to iden- 
tify GAM FOLDED PRECURSOR RNA from libraries. Each 
method involves the design of specific primers for PCR 
amplification followed by sequencing. The libraries in- 
clude hairpins as double-stranded DNA with two different 
adaptors ligated to their 5' and 3' ends. 

[0333] Reference is now made to Fig. 16A, which depicts a first 
method that uses primers designed to the stems of the 
hairpins. Since the stem of the hairpins often has bulges, 
mismatches, as well as G-T pairing, which is less signifi- 
cant in DNA than is G-U pairing in the original RNA hair- 
pin, the primer pairs were engineered to have the lowest 
possible match to the other strand of the stem. Thus, the 
F-Stem primer, derived from the 5' stem region of the 
hairpin, was chosen to have minimal match to the 3' stem 
region of the same hairpin. Similarly, the R-stem primer, 
derived from the 3' region of the hairpin (reverse comple- 
mentary to its sequence), was chosen to have minimal 
match to the 5' stem region of the same hairpin. The F- 
Stem primer was extended in its 5' sequence with the T3 
primer (S'-ATTAACCCTCACTAAAGGGA-S 1 ) and the R- 



Stem primer was extended in its 5' sequence with the 17 
primer (5 - TAATACG ACTCACTATAGGG) . The extension is 
needed to obtain a large enough fragment for direct se- 
quencing of the PCR product. Sequence data from the am- 
plified hairpins is obtained in two ways. One way is the di- 
rect sequencing of the PCR products using the T3 primer 
that matches the extension of the F-Stem primer. Another 
way is the cloning of the PCR products into a plasmid, fol- 
lowed by PCR screening of individual bacterial colonies 
using a primer specific to the plasmid vector and either 
the R-Loop (Fig. 16B) or the F-Loop (Fig. 16C) primer. 
Positive PCR products are then sent for direct sequencing 
using the vector-specific primer. 
[0334] Reference is now made to Fig. 16B, which depicts a sec- 
ond method in which R-Stem primer and R-Loop primers 
are used in a nested-PCR approach. First, PCR is per- 
formed with the R-Stem primer and the primer that 
matches the 5' adaptor sequence (5-ad primer). PCR 
products are then amplified in a second PCR using the R- 
Loop and 5-ad primers. As mentioned hereinabove, se- 
quence data from the amplified hairpins is obtained in two 
ways. One way is the direct sequencing of the PCR prod- 
ucts using the 5-ad primer. Another way is the cloning of 



the PCR products into a plasmid, followed by PCR screen- 
ing of individual bacterial colonies using a primer specific 
to the plasmid vector and F-Stem primer. Positive PCR 
products are then sent for direct sequencing using the 
vector-specific primer. It should be noted that optionally 
an extended R-Loop primer is designed that includes aT7 
sequence extension, as described hereinabove (Fig. 16A) 
for the R-Stem primer. This is important in the first se- 
quencing option in cases where the PCR product is too 
short for sequencing. 
[0335] Reference is now made to Fig. 16C, which depicts a third 
method, which is the exact reverse of the second method 
described hereinabove (Fig. 16B). F-Stem and F-Loop 
primers are used in a nested-PCR approach. First, PCR is 
performed with the F-Stem primer and the primer that 
matches the 3' adaptor sequence (3-ad primer). PCR 
products are then amplified in a second PCR using the F- 
Loop and 3-ad primers. As in the other two methods, se- 
quence data from the amplified hairpins is obtained in two 
ways. One way is the direct sequencing of the PCR prod- 
ucts using the F-Loop primer. Another way is the cloning 
of the PCR products into a plasmid, followed by PCR 
screening of individual bacterial colonies using a primer 



specific to the plasmid vector and R-Stem primer. Positive 
PCR products are then sent for direct sequencing using 
the vector-specific primer. It should be noted that option- 
ally an extended F-Loop primer is designed that includes 
aT3 sequence extension, as described hereinabove (Fig. 
16A) for the F-Stem primer. This is important in the first 
sequencing option in cases where the PCR product is too 
short for sequencing and also in order to enable the use 
of T3 primer. 

[0336] | n an embodiment of the present invention, the three 

methods mentioned hereinabove may be employed to val- 
idate the expression of CAM FOLDED PRECURSOR RNA. 

[0337] Reference is now made to Fig. 17A, which is a flow chart 
with a general description of the design of the microarray 
to identify expression of published miRNA oligonu- 
cleotides, and of novel GAM oligonucleotides of the 
present invention. 

[0338] a microarray that identifies miRNA oligonucleotides is de- 
signed (Fig. 17B). The DNA microarray is prepared by Agi- 
lent according to their SurePrint Procedure (reference de- 
scribing their technology can be obtained from the Agilent 
website, http://www.agilent.com). In this procedure, the 
oligonucleotide probes are synthesized on the glass sur- 



face. Other methods can also be used to prepare such mi- 
croarray including the printing of pre-synthesized 
oligonucleotides on glass surface or using the pho- 
tolithography method developed by Affymetrix (Lockhart 
DJ et al., Nat Biotechnol. 14:1675-1680 (1996)). The 
60-mer sequences from the design are synthesized on the 
DNA microarray. The oligonucleotides on the microarray, 
termed "probes" are of the exact sequence as the de- 
signed 60-mer sequences. Importantly, the 60-mer se- 
quences and the probes are in the sense orientation with 
regards to the miRNA oligonucleotides. Next, a cDNA li- 
brary is created from size-fractionated RNA, amplified, 
and converted back to RNA (Fig. 17C). The resulting RNA 
is termed "cRNA". The conversion to RNA is done using a 
T7 RNA polymerase promoter found on the 3' adaptor 
(Fig. 17C; T7 Ncol-RNA-DNA 3'Adaptor). Since the con- 
version to cRNA is done in the reverse direction compared 
to the orientation of the miRNA oligonucleotides, the 
cRNA is reverse complementary to the probes and is able 
to hybridize to it. This amplified RNA is hybridized with 
the microarray that identifies miRNA oligonucleotides, and 
the results are analyzed to indicate the relative level of 
miRNA oligonucleotides (and hairpins) that are present in 



the total RNA of the tissue (Fig. 18). 

[0339] Reference is now made to Fig. 17B, which describes how 
the microarray to identify miRNA oligonucleotides is de- 
signed. miRNA oligonucleotide sequences or potential 
predicted miRNA oligonucleotides are generated by using 
known or predicted hairpins as input. Overlapping poten- 
tial miRNA oligonucleotides are combined to form one 
larger sub-sequence within a hairpin. 

[0340] jo generate non-expressed sequences (tails), artificial se- 
quences are generated that are 40 nts in length, which do 
not appear in the respective organism genome, do not 
have greater than 40% homology to sequences that appear 
in the genome, and with no 15-nucleotide window that 
has greater than 80% homology to sequences that appear 
in the genome. 

[0341] jo generate probe sequences, the most probable miRNA 
oligonucleotide sequences are placed at position 3 (from 
the 5' end) of the probe. Then, a tail sub-sequence to the 
miRNA oligonucleotide sequence was attached such that 
the combined sequence length will meet the required 
probe length (60 nts for Agilent microarrays). 

[0342] The tails method provides better specificity compared to 
the triplet method. In the triplet method, it cannot be as- 



certained that the design sequence, and not an uncon- 
trolled window from the triplet probe sequence, was re- 
sponsible for hybridizing to the probe. Further, the tails 
method allows the use of different lengths for the poten- 
tial predicted miRNA oligonucleotide (of combined, over- 
lapping miRNA oligonucleotides). 

[0343] Hundreds of control probes were examined in order to 

ensure the specificity of the microarray. Negative controls 
contain probes which should have low intensity signal. For 
other control groups, the concentration of certain specific 
groups of interest in the library are monitored. Negative 
controls include tail sequences and non-hairpin se- 
quences. Other controls include mRNA for coding genes, 
tRNA, and snoRNA. 

[0344] For each probe that represents known or predicted miRNA 
oligonucleotides, additional mismatch probes were as- 
signed in order to verify that the probe intensity is due to 
perfect match (or as close as possible to a perfect match) 
binding between the target miRNA oligonucleotide cRNA 
and its respective complementary sequence on the probe. 
Mismatches are generated by changing nucleotides in dif- 
ferent positions on the probe with their respective com- 
plementary nucleotides (A <-> T, G <-> C, and vice 



versa). Mismatches in the tail region should not generate a 
significant change in the intensity of the probe signal, 
while mismatches in the miRNA oligonucleotide sequences 
should induce a drastic decrease in the probe intensity 
signal. Mismatches at various positions within the miRNA 
oligonucleotide sequence enable us to detect whether the 
binding of the probe is a result of perfect match or, alter- 
natively, nearly perfect match binding. 

[0345] Based on the above scheme, we designed a DNA microar- 
ray prepared by Agilent using their SurePrint technology. 
Table 11 is a detailed list of microarray chip probes 

[0346] KNOWN miRNA OLIGONUCLEOTIDES: 

[0347] jhe miRNA oligonucleotides and their respective precur- 
sor sequences are taken from Sanger Database to yield a 
total of 186 distinct miRNA oligonucleotide and precursor 
pairs. The following different probes are constructed: 

[0348] 1. SINGLE miRNA OLIGONUCLEOTIDE PROBES: 

[0349] From each precursor, 26-mer containing the miRNA 

oligonucleotide were taken, then assigned 3 probes for 
each extended miRNA oligonucleotide sequence: 1. the 
26-mer are at the 5' of the 60-mer probe, 2. the 26-mer 
are at the 3' of the 60-mer probe, 3. the 26-mer are in 



the middle of the 60-mer probe. Two different 34-mer 
subsequences from the design tails are attached to the 
26-mer to accomplish 60-mer probe. For a subset of 32 
of Single miRNA oligonucleotide probes, six additional 
mismatches mutations probes were designed: 
[0350] 4 block mismatches at 5' end of the miRNA oligonu- 
cleotide; 

[0351] 6 block mismatches at 3' end of the miRNA oligonu- 
cleotide; 

[0352] i mismatch at position 10 of the miRNA oligonucleotide; 

[0353] 2 mismatches at positions 8 and 17 of the miRNA 

oligonucleotide; 
[0354] 3 mismatches at positions 6, 12 and 18 of the miRNA 

oligonucleotide; and 
[0355] 6 mismatches at different positions out of the miRNA 

oligonucleotide. 
[0356] 2. DUPLEX miRNA OLIGONUCLEOTIDE PROBES: 

[0357] From each precursor, a 30-mer containing the miRNA 
oligonucleotide was taken, then duplicated to obtain 
60-mer probe. For a subset of 32 of probes, three addi- 
tional mismatch mutation probes were designed: 

[0358] 2 mismatches on the first miRNA oligonucleotide; 



[0359] 2 mismatches on the second miRNA oligonucleotide; and 

[0360] 2 mismatches on each of the miRNA oligonucleotides. 

[0361] 3. TRIPLET miRNA OLIGONUCLEOTIDE PROBES: 

[0362] Following Krichevsky's work (Krichevsky et al., RNA 

9:1274-1281 (2003)), head to tail ~22-mer length miRNA 
oligonucleotide sequences were attached to obtain 
60-mer probes containing up to three repeats of the same 
miRNA oligonucleotide sequence. For a subset of 32 
probes, three additional mismatch mutation probes were 
designed: 

[0363] 2 mismatches on the first miRNA oligonucleotide; 
[0364] 2 mismatches on the second miRNA oligonucleotide; and 
[0365] 2 mismatches on each of the miRNA oligonucleotides. 
[0366] 4 . PRECURSOR WITH miRNA OLIGONUCLEOTIDE PROBES: 

[0367] For each precursor, 60-mer containing the miRNA 

oligonucleotide were taken. 
[0368] 5. PRECURSOR WITHOUT miRNA OLIGONUCLEOTIDE 

PROBES: 

[0369] For each precursor, a 60-mer containing no more then 
16-mer of the miRNA oligonucleotide was taken. For a 
subset of 32 probes, additional mismatch probes contain- 



ing four mismatches were designed. 
[0370] CONTROL GROUPS: 

[0371] i_ 100 60-mer sequences from representative ribosomal 
RNAs. 

[0372] 2. 85 60-mer sequences from representatives tRNAs. 
[0373] 3. ig 60-mer sequences from representative snoRNA. 

[0374] 4_ 294 random 26-mer sequences from human genome 
not contained in published or predicted precursor se- 
quences, placing them at the probe's 5' and attached 
34-mer tail described above. 

[0375] 5, Negative Control: 182 different 60-mer probes con- 
tained different combinations of 10 nt-long sequences, in 
which each 10 nt-long sequence is very rare in the human 
genome, and the 60-mer combination is extremely rare. 

[0376] PREDICTED GAM RNAs: 

[0377] There are 8642 pairs of predicted GAM RNA and their re- 
spective precursors. From each precursor, a 26-mer con- 
taining the GAM RNA was placed at the 5' of the 60-mer 
probe and a 34-mer tail was attached to it. For each pre- 
dicted probe, a mutation probes with 2 mismatches at po- 
sitions 10 and 15 of the GAM RNA were added. 

[0378] For a subset of 661 predicted precursors, up to 2 probes 



each containing one side of the precursor including any 
possible GAM RNA in it were added. 
[0379] Microarray analysis: 

[0380] Based on known miRNA oligonucleotide probes, a pre- 
ferred position of the miRNA oligonucleotide on the probe 
was evaluated, and hybridization conditions adjusted and 
the amount of cRNA to optimize microarray sensitivity and 
specificity ascertained. Negative controls are used to cal- 
culate background signal mean and standard deviation. 
Different probes of the same miRNA oligonucleotide are 
used to calculate signal standard deviation as a function 
of the signal. 

[0381] For each probe, BG_Z_Score = (log(probe signal) - mean 
of log(negative control signal))/(log(negative control sig- 
nal) standard deviation) were calculated. 

[0382] For a probe with a reference probe with 2 mismatches on 
the miRNA oligonucleotide, MM_Z_Score MM_Z_Score = 
(log(perfect match signal) - log(reference mismatch sig- 
nal))/(standard deviation of log(signals) as the reference 
mismatch log(signal)) were calculated. 

[0383] BG_Z_Score and MM_Z_Score are used to decide whether 
the probe is on and its reliability. 

[0384] Reference is now made to Fig. 17C, which is a flowchart 



describing how the cDNA library was prepared from RNA 
and amplified. The general procedure was performed as 
described previously (Elbashir SM, Lendeckel W, Tuschl T. 
RNA interference is mediated by 21- and 22-nucleotide 
RNAs. Genes Dev. 2001 15:188-200) with several modifi- 
cations, which will be described hereinbelow. 
[0385] First, the starting material is prepared. Instead of starting 
with standard total RNA, the total RNA was size- 
fractionated using an YM-100 Microcon column (Millipore 
Corporation, Billerica, Massachusetts, USA) in the present 
protocol. Further, the present protocol uses human tissue 
or cell lines instead of a Drosophila in vitro system as 
starting materials. Finally, 3 micrograms of size- 
fractionated total RNA was used for the ligation of adaptor 
sequences. 

[0386] Libraries used for microarray hybridization are listed 

hereinbelow: "A" library is composed of a mix of libraries 
from Total HeLa YM100 RNA and Nuclear HeLa YM100 
RNA; "B" library is composed of a mix of libraries from 
Total HEK293 YM100 RNA and Nuclear HEK293 YM100 
RNA; "C" library is composed of a mix of YM100 RNA li- 
braries from Total PC3, Nuclear PC3 and from PC3 cells in 
which Dicer expression was transiently silenced by Dicer 



specific siRNA; "D" library is prepared from YM100 RNA 
from Total Human Brain (Ambion Cat#7962); "E" library is 
prepared from YM100 RNA from Total Human Liver 
(Ambion Cat#7960); "F" library is prepared from YM100 
RNA from Total Human Thymus (Ambion Cat#7964); "G" 
library is prepared from YM100 RNA from Total Human 
Testis (Ambion Cat#7972); and "H" library is prepared 
from YM100 RNA from Total Human Placenta (Ambion 
Cat#7950). 

[0387] Library letters appended by a numeral "1" or "2" are di- 
gested by Xbal (NEB); Library letters affixed by a numeral 
"3" are digested by Xbal and Spel (NEB); Library letters 
appended by a numeral "4" are digested by Xbal and the 
transcribed cRNA is then size-fractionated by YM30, re- 
taining the upper fraction consisting of 60 nts and longer; 
Library letters affixed by a numeral "5" are digested by 
Xbal and the transcribed cRNA is then size-fractionated 
by YM30 retaining the flow-through fraction consequently 
concentrated with YM10 consisting of 30 nts-60 nts; Li- 
brary letters affixed by a numeral "6" are digested by Xbal 
and the DNA is fractionated on a 13% native acrylamide 
gel from 40-60 nt, electroeluted on a GeBaFlex Maxi col- 
umn (GeBa Israel), and lyophilized; Library letters affixed 



by a numeral "7" are digested by Xbal and the DNA is 
fractionated on a 13% native acrylamide gel from 80-160 
nt, electroeluted and lyophilized. 

[0388] Next, unique RNA-DNA hybrid adaptor sequences with a 
T7 promoter were designed. This step is also different 
than other protocols that create libraries for microarrays. 
Most protocols use complements to the polyA tails of 
mRNA with a T7 promoter to amplify only mRNA. How- 
ever, in the present invention, adaptors are used to am- 
plify all of the RNA within the size-fractionated starting 
material. The adaptor sequences are ligated to the size- 
fractionated RNA as described in Fig. 13, with subsequent 
gel-fractionation steps. The RNA is then converted to first 
strand cDNA using reverse transcription. 

[0389] Next, the cDNA is amplified using PCR with adaptor-spe- 
cific primers. At this point, there is the optional step of 
removing the tRNA, which is likely to be present because 
of its low molecular weight, but may add background 
noise in the present experiments. All tRNA contain the se- 
quence ACC at their 3' end, and the adaptor contains GGT 
at its 5' end. This sequence together (GGTACC) is the tar- 
get site for Ncol restriction digestion. Thus, adding the 
restriction enzyme Ncol either before or during PCR am- 



plification will effectively prevent the exponential amplifi- 
cation of the cDNA sequences that are complements of 
the tRNAs. 

[0390] The amplified DNA is restriction enzyme-digested with 

Xbal (and, optionally, with Pst or Spel) to remove the ma- 
jority of the adaptor sequences that were initially added to 
the RNA. Using the first set of RNA-DNA hybrid adaptors 
listed below, the first two sets of primers listed below, 
and Xbal restriction digest yields the following cRNA 
products: 5'GGCCA - PRE/miRNA- UAUCUAG, where PRE is 
defined as GAM PRECURSOR (palindrome). Using the sec- 
ond set of RNA-DNA hybrid adaptors listed below, the 
second set of primers listed below, and Xbaland Pst re- 
striction digest yields the following, smaller cRNA prod- 
ucts: 5'GG-PRE/miRNA - C*. 

[0391] Then, cDNA is transcribed to cRNA utilizing an RNA poly- 
merase e.g. T7 dictated by the promoter incorporated in 
the adaptor. cRNA may be labeled in the course of tran- 
scription with aminoallyl or fluorescent nucleotides such 
as Cy3- or Cy5-UTP and CTP among other labels, and 
cRNA sequences thus transcribed and labeled are hy- 
bridized with the microarray. 

[0392] The following RNA-DNA hybrid adaptors are included in 



the present invention: 
[0393] Name: 17 Ncol-RNA-DNA 3'Adapter 

[0394] sequence: 

5 '( 5 p h o s) r U rG rG CCTATAGTG AGTCGTATTA( 3 1 n vdT) 3 1 
[0395] 2. Name: 5Ada RNA-DNA XbaBseRI 

[0396] Sequence: 5' AAAGGAGGAGCTCTAGrArUrA 3' or option- 
ally: 

[0397] 3. Name: 5Ada MC RNA-DNA PstAtaBser 

[0398] Sequence: 5' CCTAG GAG GAG G ACGTCTG rC r A r G 3' 

[0399] 4. Name: 3'Ada nT7 MC RNA-DNA 

[0400] Sequence: 5' (5phos) rCrCrUATAGTGAGTCGTATTATCT 
(3lnvdT)3' 

[0401] The following DNA primers are included in the present in- 
vention: 

[0402] i. Name: T7 Ncol-RT-PCR primer 

[0403] Sequence: 5' TAATACG ACTC ACTATAG G CC A 3' 

[0404] 2. Name: T7Nhel Spel-RT-PCR primer 

[0405] Sequence: 5' GCTAGCACTAGTTAATACG ACTC ACTATAG - 
GCCA 3' 

[0406] 3. Name: 5Ada XbaBseRI Fwd 



[° 407 ] Sequence: 5' AAAG G AG GAG CTCTAG ATA 3' 

[0408] 4 . Name: Pst-5AdaXbaBseRI Fwd 

[° 409 ] Sequence: 5' TGACCTGC AG AAAG GAG GAG CTCTAG ATA 3' 

[° 41 °] or optionally: 

[0411] 5. Name: 5Ada MC PstAtaBser fwd 

[0412] Sequence: 5' ATCCTAGGAGGAGGACGTCTGCAG 3' 

[0413] 6 . Name: RT nT7 MC Xbal 

[0414] Sequence: 5' G CTCTAG GATAATACGACTCACTATAGG 3' 

[0415] Reference is now made to Fig. 18A, which demonstrates 
the detection of known miRNA oligonucleotides and of 
novel GAM oligonucleotides, using a microarray con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention. Based on negative 
control probe intensity signals, we evaluated the back- 
ground, non-specific, logarithmic intensity distribution, 
and extracted its mean, designated BG.mean, and stan- 
dard deviation, designated BG_std. In order to normalize 
intensity signals between different microarray experi- 
ments, a Z score, which is a statistical measure that quan- 
tifies the distance (measured in standard deviations) that 



a data point is from the mean of a data set, was calculated 
for each probe with respect to the negative control using 
the following Z score formula: Z = (logarithm of probe 
signal BG_mean)/BG_std. We performed microarray exper- 
iments using RNA extracted from several different tissues 
and we calculated each probes maximum Z score. Fig. 
18A shows the percentages of known, predicted and neg- 
ative control groups that have a higher max Z score than a 
specified threshold as a function of max Z score thresh- 
old. The negative control group plot, included as a refer- 
ence, considers probe with a max Z score greater then 4 
as a reliable probe with meaningful signals. The sensitivity 
of our method was demonstrated by the detection of al- 
most 80% of the known published miRNA oligonucleotides 
in at least one of the examined tissues. At a threshold of 4 
for the max Z score, 28% of the predicted GAMs are 
present in at least one of the examined tissues. 

[0416] Reference is now made to Fig. 18B, which is a line graph 
showing specificity of hybridization of a microarray con- 
structed and operative in accordance with a preferred em- 
bodiment of the present invention and described herein- 
above with reference to Figs. 17A-17C. 

[0417] The average signal of known miRNA oligonucleotides in 



Library A2 is presented on a logarithmic scale as a func- 
tion of the following probe types under two different hy- 
bridization conditions: 50 C and 60 C: perfect match (PM), 
six mismatches on the tail (TAIL MM), one mismatch on 
the miRNA oligonucleotide (1MM), two separate mis- 
matches on the miRNA oligonucleotide (2MM), three sepa- 
rate mismatches on the miRNA oligonucleotide (3MM). 
The relative equality of perfect match probes and probes 
with the same miRNA oligonucleotide but many mis- 
matches over the tail attest to the independence between 
the tail and the probe signal. At a hybridization tempera- 
ture of 60 C, one mismatch in the middle of the miRNA 
oligonucleotide is enough to dramatically reduce the 
probe signal. Conducting chip hybridization at 60 C en- 
sures that a probe has a very high specificity. 

[0418] ^ is appreciated that these results demonstrate the speci- 
ficity of the microarray of the present invention in detect- 
ing expression of miRNA oligonucleotides. 

[0419] Reference is now made to Fig. 18C, which is a summary 
table demonstrating detection of known miRNA oligonu- 
cleotides using a microarray constructed and operative in 
accordance with a preferred embodiment of the present 
invention and described hereinabove with reference to 



Figs. 17A-17C. 

[0420] Labeled cRNA from HeLa cells and Human Liver, Brain, 

Thymus, Placenta, and Testes was used for 6 different hy- 
bridizations. The table contains the quantitative values 
obtained for each miRNA oligonucleotide probe. For each 
miRNA oligonucleotide, the highest value (or values) is 
given in bolded font while lower values are given in regu- 
lar font size. Results for MIR-124A, MIR-9 and MIR-122A 
are exactly as expected from previous studies. The Refer- 
ences column contains the relevant references in the pub- 
lished literature for each case. In addition to these miRNA 
oligonucleotides, the table shows other known miRNA 
oligonucleotides that are expressed in a tissue-specific 
manner. The results indicate that MIR-128A, MIR-129 and 
MIR-128Bare highly enriched in Brain; MIR-194, MIR-148 
and MIR-192 are highly enriched in Liver; mlR-96, MIR- 
150, MIR-205, MIR-182 and MIR-183 are highly enriched 
in Thymus; MIR-204, MIR-10B, MIR-154 and MIR134 are 
highly enriched in Testes; and MIR-122, MIR-210, MIR- 
221, MIR-141, MIR-23A, MIR-200C and MIR-136 are 
highly enriched in Placenta. In most cases, low but signifi- 
cant levels are observed in the other tissues. However, in 
some cases, miRNA oligonucleotides are also expressed at 



relative high levels in an additional tissue. 
[° 421 ] It is appreciated that these results reproduce previously 

published studies of expression of known miRNA oligonu- 
cleotides. These results demonstrate the reliability of the 
microarray of the present invention in detecting expres- 
sion of published miRNA oligonucleotides, and of novel 
GAM oligonucleotides of the present invention. 
DETAILED DESCRIPTION OF TABLES 

[0422] T aD | e i comprises data relating the SEQ ID NO of oligonu- 
cleotides of the present invention to their corresponding 
GAM NAME, and contains the following fields: GAM SEQ- 
ID: GAM SEQ ID NO, as in the Sequence Listing; GAM 
NAME: Rosetta Genomics Ltd. nomenclature (see below); 
GAM RNA SEQUENCE: Sequence (5' to 3') of the mature, 
"diced" GAM RNA; GAM ORGANISM: identity of the organ- 
ism encoding the GAM oligonucleotide; GAM POS: Dicer- 
cut location (see below); and 

[0423] Table 2 comprises detailed textual description according 
to the description of Fig. 1 of each of a plurality of novel 
GAM oligonucleotides of the present invention, and con- 
tains the following fields: GAM NAME: Rosetta Genomics 
Ltd. nomenclature (see below); GAM ORGANISM: identity 
of the organism encoding the GAM oligonucleotide; PRE- 



CUR SEQ-ID:GAM precursor Seq-ID, as in the Sequence 
Listing; PRECURSOR SEQUENCE: Sequence (5' to 3') of the 
GAM precursor; GAM DESCRIPTION: Detailed description 
of GAM oligonucleotide with reference to Fig. 1; and 

[0424] Table 3 comprises data relating to the source and location 
of novel GAM oligonucleotides of the present invention, 
and contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); PRECUR SEQ-ID: 
GAM precursor SEQ ID NO, as in the Sequence Listing; 
GAM ORGANISM: identity of the organism encodes the 
GAM oligonucleotide; SOURCE: For human GAM- 
chromosome encoding the human GAM oligonucleotide, 
otherwise- accession ID (GenBank, NCBI); STRAND: Orien- 
tation of the strand, "+" for the plus strand, "-" for the 
minus strand; SRC-START OFFSET: Start offset of GAM 
precursor sequence relative to the SOURCE; SRC-END 
OFFSET: End offset of GAM precursor sequence relative to 
the SOURCE; and 

[0425] Table 4 comprises data relating to GAM precursors of 

novel GAM oligonucleotides of the present invention, and 
contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); PRECUR SEQ-ID: 
GAM precursor Seq-ID, as in the Sequence Listing; GAM 



ORGANISM: identity of the organism encoding the GAM 
oligonucleotide; PRECURSOR-SEQUENCE: GAM precursor 
nucleotide sequence (5' to 3'); GAM FOLDED PRECURSOR 
RNA: Schematic representation of the GAM folded precur- 
sor, beginning 5' end (beginning of upper row) to 3' end 
(beginning of lower row), where the hairpin loop is posi- 
tioned at the right part of the draw; and 

[0426] Table 5 comprises data relating to GAM oligonucleotides 
of the present invention, and contains the following fields: 
GAM NAME: Rosetta Genomics Ltd. nomenclature (see be- 
low); GAM ORGANISM: identity of the organism encoding 
the GAM oligonucleotide; GAM RNA SEQUENCE: Sequence 
(5' to 3') of the mature, "diced" GAM RNA; PRECUR SEQ-ID: 
GAM precursor Seq-ID, as in the Sequence Listing; GAM 
POS: Dicer-cut location (see below); and 

[0427] T a bie 6 comprises data relating SEQ ID NO of the GAM 

target gene binding site sequence to TARGET gene name 
and target binding site sequence, and contains the follow- 
ing fields: TARGET BINDING SITE SEQ-ID: Target binding 
site SEQ ID NO, as in the Sequence Listing; TARGET OR- 
GANISM: identity of organism encode the TARGET gene; 
TARGET: GAM target gene name; TARGET BINDING SITE 
SEQUENCE: Nucleotide sequence (5' to 3') of the target 



binding site; and 

[0428] Table 7 comprises data relating to target-genes and bind- 
ing sites of GAM oligonucleotides of the present inven- 
tion, and contains the following fields: GAM NAME: 
Rosetta Genomics Ltd. nomenclature (see below); GAM 
ORGANISM: identity of the organism encoding the GAM 
oligonucleotide; GAM RNA SEQUENCE: Sequence (5' to 3') 
of the mature, "diced" GAM RNA; TARGET: GAM target 
gene name; TARGET REF-ID: For human target genes- 
Target accession number (RefSeq, GenBank); Otherwise- 
the location of the target gene on the genome annotation. 
TARGET ORGANISM: identity of organism encode the TAR- 
GET gene; UTR: Untranslated region of binding site/s (3' 
or 5'); TARGET BS-SEQ: Nucleotide sequence (5' to 3') of 
the target binding site; BINDING SITE-DRAW: Schematic 
representation of the binding site, upper row represent 5' 
to 3' sequence of the TARGET, Lower row represent 3' to 
5' Sequence of the GAM RNA; GAM POS: Dicer-cut location 
(see below); and 

[0429] Table 8 comprises data relating to functions and utilities 
of novel GAM oligonucleotides of the present invention, 
and contains the following fields: GAM NAME: Rosetta Ge- 
nomics Ltd. nomenclature (see below); GAM RNA SE- 



QUENCE: Sequence (5' to 3') of the mature, "diced" GAM 
RNA; GAM ORGANISM: identity of the organism encoding 
the GAM oligonucleotide; TARGET: GAM target gene name; 
TARGET ORGANISM: identity of organism encode the TAR- 
GET gene; GAM FUNCTION: Description of the GAM func- 
tions and utilities; GAM POS: Dicer-cut location (see be- 
low); and 

[0430] Table 9 comprises references of GAMs target genes and 
contains the following fields: TARGET: Target gene name; 
TARGET ORGANISM: identity of organism encode the TAR- 
GET gene; REFERENCES: reference relating to the target 
gene; and 

[0431] Table 10 comprises data relating to novel GR (Genomic 
Record) polynucleotides of the present invention, and 
contains the following fields: GR NAME: Rosetta Genomics 
Ltd. nomenclature (see below); GR ORGANISM: identity of 
the organism encoding the GR polynucleotide; GR DE- 
SCRIPTION: Detailed description of a GR polynucleotide, 
with reference to Fig. 9; and 

[0432] Table 11 comprises data of all sequences printed on the 
microarray of the microarray experiment, as described 
herein above with reference to Fig. 17 and include the fol- 
lowing fields: PROBE SEQUENCE: the sequence that was 



printed on the chip PROBE TYPE: as described in detail in 
Fig. 17 in chip design section and summarized as follows: 
Known: published miRNA sequence; Known_misl: similar 
to published miRNA sequence, but with 1 mismatch mu- 
tation on the miRNA sequence; Known_mis2: similar to 
published miRNA sequence, but with 2 mismatch muta- 
tions on the miRNA sequence; Known_mis3: similar to 
published miRNA sequence, but with 3 mismatch muta- 
tions on the miRNA sequence; Known_mis4: similar to 
published miRNA sequence, but with 6 mismatch muta- 
tions on regions other than the miRNA sequence; Pre- 
dicted: predicted GAM RNA sequences; Mismatch: se- 
quences that are similar to predicted GAM RNA sequences 
but with 2 mismatches; Edgesl: left half of GAM RNA se- 
quences; Edges2: right half of GAM RNA sequences ex- 
tended with its hairpin precursor (palindrome); Controll: 
negative control; Control2: random sequences; Control3: 
tRNA; Control4: snoRNA; Controls: mRNA; Control6: 
other; GAM RNA SEQ ID/MIR NAME: GAM oligonucleotide 
using Rosetta Genomics Ltd. Nomenclature (see below) or 
published miRNA oligonucleotide terminology; GAM RNA 
SEQUENCE: Sequence (5' to 3') of the mature, "diced" GAM 
RNA; LIBRARY: the library name as defined in Fig. 17C; 



SIGNAL: Raw signal data for library; BACKGROUND Z- 
SCORE: Z-score of probe signal with respect to back- 
ground, negative control signals; MISMATCH Z-SCORE: Z- 
score of probe signal with respect to its mismatch probe 
signal; and 

[° 433 ] Table 12 comprises data related to the GAM RNA SE- 
QUENCES included in the present invention that were vali- 
dated by laboratory means. If the validated sequence ap- 
peared in more than one GAM precursor, the GAM RNA 
SEQ-ID indicated may be arbitrarily chosen. The table in- 
cludes the following fields: VALIDATION METHOD: the 
type of validation performed on the sequence. The mi- 
crorray validations are divided into four groups: a) "Chip 
strong" refers to GAM oligonucleotide sequences whose 
intensity (SIGNAL) on the microarray "chip" was more than 
6 standard deviations above the background intensity, 
and the differential to the corresponding mismatch inten- 
sity was more than 2 standard deviations, where in this 
case the standard deviation is of the intensity of identical 
probes; b) "Chip" refers to GAM oligonucleotide se- 
quences, whose intensity was more than 4 standard devi- 
ations above the background intensity; c) "Sequenced" 
refers to GAM oligonucleotide sequences that were se- 



quenced; and d) "Chip strong, Sequenced" refers to miRNA 
oligonucleotide sequences that were both detected in the 
microarray as "Chip strong" and sequenced. "Sequenced" 
is described hereinabove with reference to Fig. 13. Other 
validations are from microarray experiments as described 
hereinabove with reference to Figs. 17A-C and 18A-C; 
SIGNAL: a raw signal data; BACKGROUND Z-SCORE: a Z- 
score of probe signal with respect to background, nega- 
tive control signals; MISMATCH Z-SCORE: a Z-score of 
probe signal with respect to its mismatch probe signal; 
and 

[0434] Table 13 comprises sequence data of GAMs associated 
with different bacterial infections. Each row refers to a 
specific bacterial infection, and lists the SEQ ID NOs of 
GAMs that target genes associated with that bacterial in- 
fection. The table contains the following fields: ROW#: in- 
dex of the row number; INFECTION NAME: name of the in- 
fecting organism; and SEQ ID NOs OF GAMS ASSOCIATED 
WITH INFECTION: list of sequence listing IDs of GAMs tar- 
geting genes that are associated with the specified infec- 
tion. 

[0435] The following conventions and abbreviations are used in 
the tables: The nucleotide "U" is represented as "T" in the 



tables, and; 

[0436] gam NAME or GR NAME are names for nucleotide se- 
quences of the present invention given by RosettaGe- 
nomics Ltd. nomenclature method. All GAMs/GRs are des- 
ignated by GAMx/GRx where x is a unique ID. 

[0437] gam POS is a position of the GAM RNA on the GAM PRE- 
CURSOR RNA sequence. This position is the Dicer-cut lo- 
cation: A indicates a probable Dicer-cut location; B indi- 
cates an alternative Dicer-cut location. 

[0438] All human nucleotide sequences of the present invention 
as well as their chromosomal location and strand orienta- 
tion are derived from sequence records of UCSC-hgl6 
version, which is based on NCBI, Build34 database (April, 
2003). 

[0439] All bacterial sequences of the present invention as well as 
their genomic location are derived from NCBI, RefSeq 
database. 



